kNN Parameters & Attributes

The parameter and attribute lists below belong to the Scikit-Learn's kNN algorithm; their availability may change for different frameworks and their versions.

The lists extracted for Scikit-Learn Version of 1.3.2

# The Scikit-Learn Version
import sklearn
print(sklearn.__version__)
# 1.3.2

Parameters

Scikit-Learn's k-Nearest Neighbors (kNN) algorithm has several parameters that allow you to configure and fine-tune the behavior of the model. Here's a brief explanation of the key parameters:

  1. n_neighbors: This is the most critical parameter in kNN. It determines the number of nearest neighbors to consider when making predictions. You need to choose an appropriate value based on your problem. A smaller value makes the model sensitive to noise, while a larger value can lead to under-smoothing.

  2. weights: This parameter controls the weight of the neighbors when making predictions. It can be set to 'uniform', which treats all neighbors equally, or 'distance', which assigns greater weight to closer neighbors.

  3. algorithm: Specifies the algorithm used to compute nearest neighbors. The available options are 'auto', 'ball_tree', 'kd_tree', or 'brute'. The 'auto' setting will choose the most suitable algorithm based on the input data.

  4. leaf_size: Applicable when the algorithm is set to 'ball_tree' or 'kd_tree'. It controls the number of points in a leaf of the tree data structure, which can affect the speed of the algorithm.

  5. p: The power parameter for the Minkowski distance metric. It is used when 'p' is set to other than the default value of 2 (Euclidean distance). For example, if you set 'p' to 1, it computes the Manhattan distance.

  6. metric: Specifies the distance metric used to measure the distance between data points. Common choices include 'euclidean', 'manhattan', 'chebyshev', 'minkowski', and custom distance metrics.

  7. metric_params: A dictionary of additional keyword arguments to be passed to the chosen distance metric function.

  8. n_jobs: Determines the number of CPU cores to use for parallel processing when computing neighbors. Setting it to -1 will use all available CPU cores.

These parameters allow you to customize the behavior of the kNN model to suit your specific problem. It's important to experiment with different parameter settings and perform cross-validation to find the best combination for your dataset and use case.

Attributes

k-Nearest Neighbors (kNN) classifier has several important attributes that provide information about the trained model and its behavior. Here are some of the key attributes of the kNN classifier:

  1. classes_: This attribute stores the unique class labels found in the target variable of the training data. It is useful for knowing the order and labels of the classes.

  2. effective_metric_: Indicates the actual distance metric used by the model. It provides information about the distance metric that was automatically determined during the model fitting process.

  3. effective_metric_params_: Contains the parameters of the effective distance metric used by the model. These parameters are set based on the chosen distance metric.

  4. n_features_in_: Shows number of features seen during fit

  5. n_samples_fit_: Indicates the number of samples in the training data used to fit the model. This attribute is helpful for tracking the size of the training dataset.

  6. output_2d_: A boolean attribute that indicates whether the output is two-dimensional (2D) or not. It is True for 2D outputs and False for one-dimensional (1D) outputs.

These attributes allow you to gain insights into the characteristics of the trained kNN model and can be useful for advanced analysis, customizing model behavior, or debugging. Keep in mind that some of these attributes are specific to the Scikit-Learn implementation of kNN, and their availability may change with different library versions.

Last updated