added

2026-05-04 00:22:32 +00:00 · 2021-02-08 20:53:32 +05:00
parent 50426ea94e
commit e8f9b371e1
11 changed files with 17 additions and 634 deletions
@@ -641,20 +641,7 @@ dataset, refer to the following note. Let\'s get started:
 ![](./images/B15019_11_15.jpg)


-    Caption: List of columns and their assigned data types

-    Note
-
-    The preceding output has been truncated.
-
-    From *Lab 10*, *Analyzing a Dataset* you know that the
-    `Id`, `MSSubClass`, `OverallQual`, and
-    `OverallCond` columns have been incorrectly classified as
-    numerical variables. They have a finite number of unique values and
-    you can\'t perform any mathematical operations on them. For example,
-    it doesn\'t make sense to add, remove, multiply, or divide two
-    different values from the `Id` column. Therefore, you need
-    to convert them into categorical variables.

 6.  Using the `astype()` method, convert the `'Id'`
    column into a categorical variable, as shown in the following code
@@ -694,14 +681,6 @@ dataset, refer to the following note. Let\'s get started:
 ![](./images/B15019_11_16.jpg)


-    Caption: List of categories for the four newly converted
-    variables
-
-    Now, these four columns have been converted into categorical
-    variables. From the output of *Step 5*, we can see that there are a
-    lot of variables of the `object` type. Let\'s have a look
-    at them and see if they need to be converted as well.
-
 9.  Create a new DataFrame called `obj_df` that will only
    contain variables of the `object` type using the
    `select_dtypes` method along with the
@@ -1348,15 +1327,7 @@ You should get the following output:

 ![](./images/B15019_11_38.jpg)

-Caption: Rows with missing values in CustomerID
-
-This time, all the transactions look normal, except they are missing
-values for the `CustomerID` column; all the other variables
-have been filled in with values that seem genuine. There is no other way
-to infer the missing values for the `CustomerID` column. These
-rows represent almost 25% of the dataset, so we can\'t remove them.
-
-However, most algorithms require a value for each observation, so you
+Most algorithms require a value for each observation, so you
 need to provide one for these cases. We will use the
 `.fillna()` method from `pandas` to do this. Provide
 the value to be imputed as `Missing` and use
@@ -1385,15 +1356,6 @@ You should get the following output:

 ![](./images/B15019_11_40.jpg)

-Caption: Summary of missing values for each variable
-
-You have successfully fixed all the missing values in this dataset.
-These methods also work when we want to handle missing numerical
-variables. We will look at this in the following exercise. All you need
-to do is provide a numerical value when you want to impute a value with
-`.fillna()`.
-
-

 Exercise 11.04: Fixing Missing Values for the Horse Colic Dataset
 -----------------------------------------------------------------
@@ -1699,48 +1661,40 @@ The following figure illustrates a list of unique values for gaming:

 ![](./images/B15019_11_49.jpg)

-Caption: List of unique values for gaming

 The following figure displays the data types of each column:

 ![](./images/B15019_11_50.jpg)

-Caption: Data types of each column

 The following figure displays the updated data types of each column:

 ![](./images/B15019_11_51.jpg)

-Caption: Data types of each column

 The following figure displays the number of missing values for numerical
 variables:

 ![](./images/B15019_11_52.jpg)

-Caption: Number of missing values for numerical variables

 The following figure displays the list of unique values for
 `int_corr`:

 ![](./images/B15019_11_53.jpg)

-Caption: List of unique values for \'int\_corr\'

 The following figure displays the list of unique values for numerical
 variables:

 ![](./images/B15019_11_54.jpg)

-Caption: List of unique values for numerical variables

 The following figure displays the number of missing values for numerical
 variables:

 ![](./images/B15019_11_55.jpg)

-Caption: Number of missing values for numerical variables
-

 Summary
 =======
@@ -38,14 +38,7 @@ You should get the following output.

 ![](./images/B15019_12_01.jpg)

-Caption: First five rows of the Online Retail dataset

-Next, we are going to load all the public holidays in the UK into
-another `pandas` DataFrame. From *Lab 10*, *Analyzing a
-Dataset* we know the records of this dataset are only for the years 2010
-and 2011. So we are going to extract public holidays for those two
-years, but we need to do so in two different steps as the API provided
-by `date.nager` is split into single years only.

 Let\'s focus on 2010 first:

@@ -759,17 +752,6 @@ You should get the following output:
 ```
 30
 ```
-`30` is the number of unique values for the
-`Country_bin` column. So we reduced the number of unique
-values in this column from `38` to `30`:
-
-We just saw how to group categorical values together, but the same
-process can be applied to numerical values as well. For instance, it is
-quite common to group people\'s ages into bins such as 20s (20 to 29
-years old), 30s (30 to 39), and so on.
-
-Have a look at *Exercise 12.02*, *Binning the YearBuilt variable from
-the AMES Housing dataset*.



@@ -1768,8 +1750,3 @@ of a dataset are and identifying data quality issues. We saw how to
 handle and fix some of the most frequent issues (duplicate rows, type
 conversion, value replacement, and missing values) using
 `pandas`\' APIs. Finally, we went through several feature engineering techniques.
-
-The next lab opens a new part of this course that presents data
-science use cases end to end. *Lab 13*, *Imbalanced Datasets*, will
-walk you through an example of an imbalanced dataset and how to deal
-with such a situation.
@@ -148,10 +148,6 @@ Classification*, and you will look closely at the metrics:
    ```


-    After the categorical values are transformed, they must be combined
-    with the scaled numerical values of the data frame to get the
-    feature-engineered dataset.
-
 10. Create the independent variables, `X`, and dependent
    variables, `Y`, from the combined dataset for modeling, as
    in the following code snippet:
@@ -171,13 +167,9 @@ Classification*, and you will look closely at the metrics:
    The output is as follows:

    
-![Caption: The independent variables and the combined data
-    (truncated) ](./images/B15019_13_03.jpg)
+![](./images/B15019_13_03.jpg)


-    Caption: The independent variables and the combined data
-    (truncated)
-
    We are now ready for the modeling task. Let\'s first import the
    necessary packages.

@@ -1693,45 +1693,6 @@ The following steps will help you complete this exercise:



-From this exercise, you may come up with a few questions:
-
- How do you think we can improve the classification results using
-    ICA?
- Increasing the number of components results in a marginal increase
-    in the accuracy metrics.
- Are there any other side effects because of the strategy adopted to
-    improve the results?
-
-Increasing the number of components also results in a longer training
-time for the logistic regression model.
-
-
-
-Factor Analysis
---------------
-
-Factor analysis is a technique that achieves dimensionality reduction by
-grouping variables that are highly correlated. Let\'s look at an example
-from our context of predicting advertisements.
-
-In our dataset, there could be many features that describe the geometry
-(the size and shape of an image in the ad) of the images on a web page.
-These features can be correlated because they refer to specific
-characteristics of an image.
-
-Similarly, there could be many features that describe the anchor text or
-phrases occurring in a URL, which are highly correlated. Factor analysis
-looks at correlated groups such as these from the data and then groups
-them into latent factors. Therefore, if there are 10 raw features
-describing the geometry of an image, factor analysis will group them
-into one feature that characterizes the geometry of an image. Each of
-these groups is called factors. As many correlated features are combined
-to form a group, the resulting number of features will be much smaller
-in comparison with the original dimensions of the dataset.
-
-Let\'s now see how factor analysis can be implemented as a technique for
-dimensionality reduction.
-


 Exercise 14.06: Dimensionality Reduction Using Factor Analysis
@@ -2015,18 +1976,7 @@ You should get the following output:

 ![](./images/B15019_14_35.jpg)

-Caption: Sample data frame

-What we will do next is sample some data points with the same shape as
-the data frame we created.
-
-Let\'s sample some data points from a normal distribution that has mean
-`0` and standard deviation of `0.1`. We touched
-briefly on normal distributions in *Lab 3, Binary Classification.* A
-normal distribution has two parameters. The first one is the mean, which
-is the average of all the data in the distribution, and the second one
-is standard deviation, which is a measure of how spread out the data
-points are.

 By assuming a mean and standard deviation, we will be able to draw
 samples from a normal distribution using the
@@ -19,109 +19,6 @@ where we will try to predict whether a credit card application will be
 approved.


-Introduction
-============
-
-
-In the previous lab, we learned various techniques, such as the
-backward elimination technique, factor analysis, and so on, that helped
-us to deal with high-dimensional datasets.
-
-In this lab, we will further enhance our repertoire of skills with
-another set of techniques, called **ensemble learning**, in which we
-will be dealing with different ensemble learning techniques such as the
-following:
-
- Averaging
- Weighted averaging
- Max voting
- Bagging
- Boosting
- Blending
-
-
-Ensemble Learning
-=================
-
-
-Ensemble learning, as the name denotes, is a method that combines
-several machine learning models to generate a superior model, thereby
-decreasing variability/variance and bias, and boosting performance.
-
-Before we explore what ensemble learning is, let\'s look at the concepts
-of bias and variance with the help of the classical bias-variance
-quadrant, as shown here:
-
-![](./images/B15019_15_01.jpg)
-
-Caption: Bias-variance quadrant
-
-
-
-Variance
--------
-
-Variance is the measure of how spread out data is. In the context of
-machine learning, models with high variance imply that the predictions
-generated on the same test set will differ considerably when different
-training sets are used to fit the model. The underlying reason for high
-variability could be attributed to the model being attuned to specific
-nuances of training data rather than generalizing the relationship
-between input and output. Ideally, we want every machine learning model
-to have low variance.
-
-
-
-Bias
----
-
-Bias is the difference between the ground truth and the average value of
-our predictions. A low bias will indicate that the predictions are very
-close to the actual values. A high bias implies that the model has
-oversimplified the relationship between the inputs and outputs, leading
-to high error rates on test sets, which again is an undesirable outcome.
-
-*Figure 15.1* helps us to visualize the trade-off between bias and
-variance. The top-left corner is the depiction of a scenario where the
-bias is high, and the variance is low. The top-right quadrant displays a
-scenario where both bias and variance are high. From the figure, we can
-see that when the bias is high, it is further away from the truth, which
-in this case, is the *bull\'s eye*. The presence of variance is
-manifested as whether the arrows are spread out or congregated in one
-spot.
-
-Ensemble models combine many weaker models that differ in variance and
-bias, thereby creating a better model, outperforming the individual
-weaker models. Ensemble models exemplify the adage *the wisdom of the
-crowds*. In this lab, we will learn about different ensemble
-techniques, which can be classified into two types, that is, simple and
-advanced techniques:
-
-![](./images/B15019_15_02.jpg)
-
-Caption: Different ensemble learning methods
-
-
-
-Business Context
----------------
-
-You are working in the credit card division of your bank. The operations
-head of your company has requested your help in determining whether a
-customer is creditworthy or not. You have been provided with credit card
-operations data.
-
-This dataset contains credit card applications with around 15 variables.
-The variables are a mix of continuous and categorical data pertaining to
-credit card operations. The label for the dataset is a flag, which
-indicates whether the application has been approved or not.
-
-You want to fit some benchmark models and try some ensemble learning
-methods on the dataset to address the problem and come up with a tool
-for predicting whether or not a given customer should be approved for
-their credit application.
-
-

 Exercise 15.01: Loading, Exploring, and Cleaning the Data
 ---------------------------------------------------------
@@ -783,71 +680,6 @@ the new combination of weights in *iteration 2*:
 ![](./images/B15019_15_21.jpg)


-Caption: Classification report
-
-In this exercise, we implemented the weighted averaging technique for
-ensemble learning. We did two iterations with the weights. We saw that
-in the second iteration, where we increased the weight of the logistic
-regression prediction from `0.6` to `0.7`, the
-accuracy actually improved from `0.89` to `0.90`.
-This is a validation of our assumption about the prominence of the
-logistic regression model in the ensemble. To check whether there is
-more room for improvement, we should again change the weights, just like
-we did in iteration `2`, and then validate against the
-metrics. We should continue these iterations until there is no further
-improvement noticed in the metrics.
-
-Comparing it with the metrics from the averaging method, we can see that
-the accuracy level has gone down from `0.91` to
-`0.90`. However, the recall value of class `1` has
-gone up from `0.91` to `0.92`, and the corresponding
-value for class `0` has gone down from `0.91` to
-`0.88`. It could be that the weights that we applied have
-resulted in a marginal degradation of the results from what we got from
-the averaging method.
-
-Looking at the results from a business perspective, we can see that with
-the increase in the recall value of class `1`, the card
-division is getting more creditworthy customers. However, this has come
-at the cost of increasing the risk with more unworthy customers, with
-`12%` (`100% - 88%`) being tagged as creditworthy
-customers.
-
-
-
-### Max Voting
-
-The max voting method works on the principle of majority rule. In this
-method, the opinion of the majority rules the roost. In this technique,
-individual models, or, in ensemble learning jargon, individual learners,
-are fit on the training set and their predictions are then generated on
-the test set. Each individual learner\'s prediction is considered to be
-a vote. On the test set, whichever class gets the maximum vote is the
-ultimate winner. Let\'s demonstrate this with a toy example.
-
-Let\'s say we have three individual learners who learned on the training
-set. Each of them generates their predictions on the test set, which is
-tabulated in the following table. The predictions are either for class
-\'1\' or class \'0\':
-
-![](./images/B15019_15_22.jpg)
-
-Caption: Predictions for learners
-
-In the preceding example, we can see that for `Example 1` and
-`Example 3`, the majority vote is for class \'1,\' and for the
-other two examples, the majority of the vote is for class \'0\'. The
-final predictions are based on which class gets the majority vote. This
-method of voting, where we output a class, is called \"hard \" voting.
-
-When implementing the max voting method using the
-`scikit-learn` library, we use a special function called
-`VotingClassifier()`. We provide individual learners as input
-to `VotingClassifier` to create the ensemble model. This
-ensemble model is then used to fit the training set and then is finally
-used to predict on the test sets. We will explore the dynamics of max
-voting in *Exercise 15.04*, *Ensemble Model Using Max Voting*.
-


 Exercise 15.04: Ensemble Model Using Max Voting
@@ -967,101 +799,6 @@ regression, KNN, and random forest:
 ![](./images/B15019_15_24.jpg)


-Caption: Classification report
-
-
-
-Advanced Techniques for Ensemble Learning
-=========================================
-
-
-Having learned simple techniques for ensemble learning, let\'s now
-explore some advanced techniques. Among the advanced techniques, we will
-be dealing with three different kinds of ensemble learning:
-
- Bagging
- Boosting
- Stacking/blending
-
-Before we deal with each of them, there are some basic dynamics of these
-advanced ensemble learning techniques that need to be deciphered. As
-described at the beginning of the lab, the essence of ensemble
-learning is in combining individual models to form a superior model.
-There are some subtle nuances in the way the superior model is generated
-in the advanced techniques. In these techniques, the individual models
-or learners generate predictions and those predictions are used to form
-the final predictions. The individual models or learners, which generate
-the first set of predictions, are called **base** **learners** or
-**base** **estimators** and the model, which is a combination of the
-predictions of the base learners, is called the **meta** **learner** or
-**meta estimator**. The way in which the meta learners learn from the
-base learners differs for each of the advanced techniques. Let\'s
-understand each of the advanced techniques in detail.
-
-
-
-Bagging
-------
-
-Bagging is a pseudonym for **B**ootstrap **Agg**regat**ing**. Before we
-explain how bagging works, let\'s describe what bootstrapping is.
-Bootstrapping has its etymological origins in the phrase, *Pulling
-oneself up by one\'s bootstrap*. The essence of this phrase is to make
-the best use of the available resources. In the statistical context,
-bootstrapping entails taking samples from the available dataset by
-replacement. Let\'s look at this concept with a toy example.
-
-Suppose we have a dataset consisting of 10 numbers from 1 to 10. We now
-need to create 4 different datasets of 10 each from the available
-dataset. How do we do this? This is where the concept of bootstrapping
-comes in handy. In this method, we take samples from the available
-dataset one by one and then replace the number we took before taking the
-next sample. We continue with this until we get a sample with the number
-of data points we need.
-
-As we are replacing each number after it is selected, there is a chance
-that we might have more than one of a given data point in a sample. This
-is explained by the following figure:
-
-![](./images/B15019_15_25.jpg)
-
-Caption: Bootstrapping
-
-Now that we have understood bootstrapping, let\'s apply this concept to
-a machine learning context. Earlier in the lab, we discussed that
-ensemble learning helps in reducing the variance of predictions. One way
-that variance could be reduced is by averaging out the predictions from
-multiple learners. In bagging, multiple subsets of the data are created
-using bootstrapping. On each of these subsets of data, a base learner is
-fitted and the predictions generated. These predictions from all the
-base learners are then averaged to get the meta learner or the final
-predictions.
-
-When implementing bagging, we use a function called
-`BaggingClassifier()`, which is available in the
-`Scikit learn` library. Some of the important arguments that
-are provided when creating an ensemble model include the following:
-
- `base_estimator`: This argument is to define the base
-    estimator to be used.
- `n_estimator`: This argument defines the number of base
-    estimators that will be used in the ensemble.
- `max_samples`: The maximum size of the bootstrapped sample
-    for fitting the base estimator is defined using this argument. This
-    is represented as a proportion (0.8, 0.7, and so on).
- `max_features`: When fitting multiple individual learners,
-    it has been found that randomly selecting the features to be used in
-    each dataset results in superior performance. The
-    `max_features` argument indicates the number of features
-    to be used. For example, if there were 10 features in the dataset
-    and the `max_features` argument was to be defined as 0.8,
-    then only 8 (0.8 x 10) features would be used to fit a model using
-    the base learner.
-
-Let\'s explore ensemble learning with bagging in *Exercise 15.05*,
-*Ensemble Learning Using Bagging*.
-
-

 Exercise 15.05: Ensemble Learning Using Bagging
 -----------------------------------------------
@@ -232,12 +232,6 @@ The following steps will help you to complete this exercise:
    ```


-    The use of the backslash character, `\`, on *line 4* in
-    the preceding code snippet is to enforce the continuation of code on
-    to a new line in Python. The `\` character is not required
-    if you are entering the full line of code into a single line in
-    your notebook.
-
    You should get the following output:

    
@@ -81,12 +81,6 @@ The following steps will help you to complete this exercise:
    ```


-    Note
-
-    The `#` symbol in the code snippet above denotes a code
-    comment. Comments are added into code to help explain specific bits
-    of logic.
-
    The `pd.read_csv()` function\'s arguments are the filename
    as a string and the limit separator of a CSV, which is
    `";"`. After reading the file, the DataFrame is printed
@@ -289,23 +283,12 @@ their age. We will be using a line graph for this exercise.

 The following steps will help you to complete this exercise:

-1.  Begin by defining the hypothesis.
-
-    The first step in the verification process will be to define a
-    hypothesis about the relationship. A hypothesis can be based on your
-    experiences, domain knowledge, some published pieces of knowledge,
-    or your business intuitions.
-
-    Let\'s first define our hypothesis on age and propensity to buy term
+1.  Let\'s first define our hypothesis on age and propensity to buy term
    deposits:

    *The propensity to buy term deposits is more with elderly customers
    compared to younger ones*. This is our hypothesis.

-    Now that we have defined our hypothesis, it is time to verify its
-    veracity with the data. One of the best ways to get business
-    intuitions from data is by taking cross-sections of our data and
-    visualizing them.

 2.  Import the pandas and altair packages:
    ```
@@ -75,21 +75,12 @@ from the DataFrame:
 ```
 target = df.pop('Activity')
 ```
+
 Now the response variable is contained in the variable called
 `target` and all the features are in the DataFrame called
 `df`.

-Now we are going to split the dataset into training and testing sets.
-The model uses the training set to learn relevant parameters in
-predicting the response variable. The test set is used to check whether
-a model can accurately predict unseen data. We say the model is
-overfitting when it has learned the patterns relevant only to the
-training set and makes incorrect predictions about the testing set. In
-this case, the model performance will be much higher for the training
-set compared to the testing one. Ideally, we want to have a very similar
-level of performance for the training and testing sets. This topic will
-be covered in more depth in *Lab 7*, *The Generalization of Machine
-Learning Models*.
+

 The `sklearn` package provides a function called
 `train_test_split()` to randomly split the dataset into two
@@ -116,6 +107,7 @@ class from `sklearn.ensemble`:
 ```
 from sklearn.ensemble import RandomForestClassifier
 ```
+
 Now we can instantiate the Random Forest classifier with some
 hyperparameters. Remember from *Lab 1, Introduction to Data Science
 in Python*, a hyperparameter is a type of parameter the model can\'t
@@ -203,15 +195,9 @@ The output will be as follows:

 ![](./images/B15019_04_06.jpg)

-Caption: Accuracy score on the training set


-Remember, in the last section, we split the dataset into training and
-testing sets. We used the training set to fit the model and assess its
-predictive power on it. But it hasn\'t seen the observations from the
-testing set at all, so we can use it to assess whether our model is
-capable of generalizing unseen data. Let\'s calculate the accuracy score
-for the testing set:
+Let\'s calculate the accuracy score for the testing set:

 ```
 test_preds = rf_model.predict(X_test)
@@ -438,94 +424,15 @@ score:



-Number of Trees Estimator
-------------------------

-Now that we know how to fit a Random Forest classifier and assess its
-performance, it is time to dig into the details. In the coming sections,
-we will learn how to tune some of the most important hyperparameters for
-this algorithm. As mentioned in *Lab 1, Introduction to Data Science
-in Python*, hyperparameters are parameters that are not learned
-automatically by machine learning algorithms. Their values have to be
-set by data scientists. These hyperparameters can have a huge impact on
-the performance of a model, its ability to generalize to unseen data,
-and the time taken to learn patterns from the data.

-The first hyperparameter you will look at in this section is called
-`n_estimators`. This hyperparameter is responsible for
-defining the number of trees that will be trained by the
-`RandomForest` algorithm.
-
-Before looking at how to tune this hyperparameter, we need to understand
-what a tree is and why it is so important for the
-`RandomForest` algorithm.
-
-A tree is a logical graph that maps a decision and its outcomes at each
-of its nodes. Simply speaking, it is a series of yes/no (or true/false)
-questions that lead to different outcomes.
-
-A leaf is a special type of node where the model will make a prediction.
-There will be no split after a leaf. A single node split of a tree may
-look like this:
-
-![](./images/B15019_04_14.jpg)
-
-Caption: Example of a single tree node
-
-A tree node is composed of a question and two outcomes depending on
-whether the condition defined by the question is met or not. In the
-preceding example, the question is `is avg_rss12 > 41?` If the
-answer is yes, the outcome is the `bending_1` leaf and if not,
-it will be the `sitting` leaf.
-
-A tree is just a series of nodes and leaves combined together:
-
-![](./images/B15019_04_15.jpg)
-
-Caption: Example of a tree
-
-In the preceding example, the tree is composed of three nodes with
-different questions. Now, for an observation to be predicted as
-`sitting`, it will need to meet the conditions:
-`avg_rss13 <= 41`, `var_rss > 0.7`, and
-`avg_rss13 <= 16.25`.
-
-The `RandomForest` algorithm will build this kind of tree
-based on the training data it sees. We will not go through the
-mathematical details about how it defines the split for each node but,
-basically, it will go through every column of the dataset and see which
-split value will best help to separate the data into two groups of
-similar classes. Taking the preceding example, the first node with the
-`avg_rss13 > 41` condition will help to get the group of data
-on the left-hand side with mostly the `bending_1` class. The
-`RandomForest` algorithm usually builds several of this kind
-of tree and this is the reason why it is called a forest.
-
-As you may have guessed now, the `n_estimators` hyperparameter
-is used to specify the number of trees the `RandomForest`
-algorithm will build. For example (as in the previous exercise), say we
-ask it to build 10 trees. For a given observation, it will ask each tree
-to make a prediction. Then, it will average those predictions and use
-the result as the final prediction for this input. For instance, if, out
-of 10 trees, 8 of them predict the outcome `sitting`, then the
-`RandomForest` algorithm will use this outcome as the final
-prediction.
-
-Note
-
-If you don\'t pass in a specific `n_estimators`
-hyperparameter, it will use the default value. The default depends on
-the version of scikit-learn you\'re using. In early versions, the
-default value is 10. From version 0.22 onwards, the default is 100. You
-can find out which version you are using by executing the following
-code:
+You can find out which version you are using by executing the following code:

 `import sklearn`

 `sklearn.__version__`

-For more information, see here:
-<https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html>
+

 In general, the higher the number of trees is, the better the
 performance you will get. Let\'s see what happens with
@@ -1118,31 +1025,8 @@ print(accuracy_score(y_test, test_preds9))

 The output will be as follows:

-![Caption: Accuracy scores for the training and testing sets for
-min\_samples\_leaf=25 ](./images/B15019_04_31.jpg)
+![](./images/B15019_04_31.jpg)

-Caption: Accuracy scores for the training and testing sets for
-min\_samples\_leaf=25
-
-Both accuracies for the training and testing sets decreased but they are
-quite close to each other now. So, we will keep this value
-(`25`) as the optimal one for this dataset as the performance
-is still OK and we are not overfitting too much.
-
-When choosing the optimal value for this hyperparameter, you need to be
-careful: a value that\'s too low will increase the chance of the model
-overfitting, but on the other hand, setting a very high value will lead
-to underfitting (the model will not accurately predict the right
-outcome).
-
-For instance, if you have a dataset of `1000` rows, if you set
-`min_samples_leaf` to `400`, then the model will not
-be able to find good splits to predict `5` different classes.
-In this case, the model can only create one single split and the model
-will only be able to predict two different classes instead of
-`5`. It is good practice to start with low values first and
-then progressively increase them until you reach satisfactory
-performance.



@@ -1258,13 +1142,6 @@ We will be using the same zoo dataset as in the previous exercise.
 ![](./images/B15019_04_33.jpg)


-    Caption: Accuracy scores for the training and testing sets
-
-    The accuracy score decreased for both the training and testing sets
-    compared to the best result we got in the previous exercise. Now the
-    difference between the training and testing sets\' accuracy scores
-    is much smaller so our model is overfitting less.
-
 11. Instantiate another `RandomForestClassifier` with
    `random_state=42`, `n_estimators=30`,
    `max_depth=2`, and `min_samples_leaf=7`, and
@@ -77,13 +77,6 @@ The following steps will help you complete the exercise:
 ![](./images/B15019_06_01.jpg)


-    Caption: The car dataset without headers
-
-    Note
-
-    Alternatively, you can enter the dataset URL in the browser to view
-    the dataset.
-
    `CSV` files normally have the name of each column written
    in the first row of the data. For instance, have a look at this
    dataset\'s CSV file, which you used in *Lab 3, Binary
@@ -1375,19 +1368,6 @@ The following steps will help you accomplish this task:
 ![](./images/B15019_06_35.jpg)


-    Caption: Reading the dataset
-
-    You will need to do a few things to work with this file. Skip 15
-    rows and specify the column headers and read the file without an
-    index.
-
-    The code shows how you do that by creating a Python list to hold
-    your column headers and then read in the file using
-    `read_csv()`. The parameters that you pass in are the
-    file\'s location, the column headers as a Python list, the name of
-    the index column (in this case, it is None), and the number of rows
-    to skip.
-
    The `head()` method will print out the top five rows and
    should look similar to the following:

@@ -784,44 +784,11 @@ dataset we will use contains 1,797 labeled images of handwritten digits.
 ![](./images/B15019_08_14.jpg)


-Caption: Using pandas to visualize the results
-
-
-
-Advantages and Disadvantages of Grid Search
-------------------------------------------
-
-The primary advantage of the grid search compared to a manual search is
-that it is an automated process that one can simply set and forget.
-Additionally, you have the power to dictate the exact
-hyperparameterizations evaluated, which can be a good thing when you
-have prior knowledge of what kind of hyperparameterizations might work
-well in your context. It is also easy to understand exactly what will
-happen during the search thanks to the explicit definitions of the grid.
-
-The major drawback of the grid search strategy is that it is
-computationally very expensive; that is, when the number of
-hyperparameterizations to try increases substantially, processing times
-can be very slow. Also, when you define your grid, you may inadvertently
-omit an hyperparameterization that would in fact be optimal. If it is
-not specified in your grid, it will never be tried

 To overcome these drawbacks, we will be looking at random search in the
 next section.


-Random Search
-=============
-
-
-Instead of searching through every hyperparameterizations in a
-pre-defined set, as is the case with a grid search, in a random search
-we sample from a distribution of possibilities by assuming each
-hyperparameter to be a random variable. Before we go through the process
-in depth, it will be helpful to briefly review what random variables are
-and what we mean by a distribution.
-
-

 Random Variables and Their Distributions
 ----------------------------------------
@@ -831,6 +798,7 @@ Random Variables and Their Distributions

 Caption: Probability mass function for the discrete uniform distribution

+
 The following code will allow us to see the form of this distribution
 with 10 possible values of X.

@@ -900,9 +868,7 @@ p_X_1 = stats.norm.pdf(x=x, loc=0.0, scale=1.0**2)
 p_X_2 = stats.norm.pdf(x=x, loc=0.0, scale=1.5**2)
 ```

-Note
-
-In this case, `loc` corresponds to 𝜇, while `scale`
+**Note:** In this case, `loc` corresponds to 𝜇, while `scale`
 corresponds to the standard deviation, which is the square root of
 `𝜎``2`, hence why we square the inputs.

@@ -1017,9 +983,7 @@ samples = stats.gamma.rvs(a=1, loc=1, scale=2, \
                          size=n_iter, random_state=100)
 ```

-Note
-
-We set a random state to ensure reproducible results.
+**Note** We set a random state to ensure reproducible results.

 Plotting a histogram of the sample, as shown in the following figure,
 reveals a shape that approximately conforms to the distribution that we
@@ -1086,17 +1050,14 @@ The output will be as follows:

 ![](./images/B15019_08_22.jpg)

-Caption: Output for the random search process

-Note
-
-The results will be different, depending on the data used.

 It is always beneficial to visualize results where possible. Plotting 𝛼
 by negative mean squared error as a scatter plot makes it clear that
 venturing away from 𝛼 = 1 does not result in improvements in predictive
 performance:

+
 ```
 plt.scatter(df_result.alpha, \
            df_result.mean_neg_mean_squared_error)
@@ -1108,7 +1069,6 @@ The output will be as follows:

 ![](./images/B15019_08_23.jpg)

-Caption: Plotting the scatter plot

 The fact that we found the optimal 𝛼 to be 1 (its default value) is a
 special case in hyperparameter tuning in that the optimal
@@ -1189,9 +1149,7 @@ The output will be as follows:

 Caption: Output for tuning using RandomizedSearchCV

-Note
-
-The preceding results may vary, depending on the data.
+Note: The preceding results may vary, depending on the data.



@@ -1351,12 +1309,6 @@ The following steps will help you complete the exercise.
 ![](./images/B15019_08_26.jpg)


-    Caption: Top five hyperparameterizations
-
-    Note
-
-    You may get slightly different results. However, the values you
-    obtain should largely agree with those in the preceding output.

 9.  The last step is to visualize the result. Including every
    parameterization will result in a cluttered plot, so we will filter
@@ -364,11 +364,8 @@ The output will be as shown in the following figure:

 ![](./images/B15019_09_12.jpg)

-Caption: Feature importance of a Random Forest model

-Note
-
-Due to randomization, you may get a slightly different result.
+**Note:** Due to randomization, you may get a slightly different result.

 It might be a little difficult to evaluate which importance value
 corresponds to which variable from this output. Let\'s create a
@@ -1378,19 +1375,9 @@ We will be using the same dataset as in the previous exercise.
    You should get the following output:

    
-![Caption: LIME output for the third observation of the testing
-    set ](./images/B15019_09_41.jpg)
+![](./images/B15019_09_41.jpg)


-Caption: LIME output for the third observation of the testing set
-
-
-You have completed the last exercise of this lab. You saw how to use
-LIME to interpret the prediction of single observations. We learned that
-the `a1pop`, `a2pop`, and `a3pop` features
-have a strong negative impact on the first and third observations of the
-training set.
-


 Activity 9.01: Train and Analyze a Network Intrusion Detection Model