Use Azure Machine Learning with the Fairlearn open-source package to assess the fairness of ML models (preview)

APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

In this how-to guide, you will learn to use the Fairlearn open-source Python package with Azure Machine Learning to perform the following tasks:

  • Assess the fairness of your model predictions. To learn more about fairness in machine learning, see the fairness in machine learning article.
  • Upload, list and download fairness assessment insights to/from Azure Machine Learning studio.
  • See a fairness assessment dashboard in Azure Machine Learning studio to interact with your model(s)' fairness insights.

Note

Fairness assessment is not a purely technical exercise. This package can help you assess the fairness of a machine learning model, but only you can configure and make decisions as to how the model performs. While this package helps to identify quantitative metrics to assess fairness, developers of machine learning models must also perform a qualitative analysis to evaluate the fairness of their own models.

Azure Machine Learning Fairness SDK

The Azure Machine Learning Fairness SDK, azureml-contrib-fairness, integrates the open-source Python package, Fairlearn, within Azure Machine Learning. To learn more about Fairlearn's integration within Azure Machine Learning, check out these sample notebooks. For more information on Fairlearn, see the example guide and sample notebooks.

Use the following commands to install the azureml-contrib-fairness and fairlearn packages:

pip install azureml-contrib-fairness
pip install fairlearn==0.4.6

Upload fairness insights for a single model

The following example shows how to use the fairness package to upload model fairness insights into Azure Machine Learning and see the fairness assessment dashboard in Azure Machine Learning studio.

  1. Train a sample model in a Jupyter notebook.

    For the dataset, we use the well-known adult census dataset, which we load using shap (for convenience). For the purposes of this example, we treat this dataset as a loan decision problem and pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan.

    from sklearn.model_selection import train_test_split
    from fairlearn.widget import FairlearnDashboard
    from sklearn.linear_model import LogisticRegression
    from sklearn.preprocessing import LabelEncoder, StandardScaler
    import pandas as pd
    import shap
    
    # Load the census dataset
    X_raw, Y = shap.datasets.adult()
    X_raw["Race"].value_counts().to_dict()
    
    
    # (Optional) Separate the "sex" and "race" sensitive features out and drop them from the main data prior to training your model
    A = X_raw[['Sex','Race']]
    X = X_raw.drop(labels=['Sex', 'Race'],axis = 1)
    X = pd.get_dummies(X)
    
    sc = StandardScaler()
    X_scaled = sc.fit_transform(X)
    X_scaled = pd.DataFrame(X_scaled, columns=X.columns)
    
    # Perform some standard data preprocessing steps to convert the data into a format suitable for the ML algorithms
    le = LabelEncoder()
    Y = le.fit_transform(Y)
    
    # Split data into train and test
    from sklearn.model_selection import train_test_split
    from sklearn.model_selection import train_test_split
    X_train, X_test, Y_train, Y_test, A_train, A_test = train_test_split(X_scaled, 
                                                        Y, 
                                                        A,
                                                        test_size = 0.2,
                                                        random_state=0,
                                                        stratify=Y)
    
    # Work around indexing issue
    X_train = X_train.reset_index(drop=True)
    A_train = A_train.reset_index(drop=True)
    X_test = X_test.reset_index(drop=True)
    A_test = A_test.reset_index(drop=True)
    
    # Improve labels
    A_test.Sex.loc[(A_test['Sex'] == 0)] = 'female'
    A_test.Sex.loc[(A_test['Sex'] == 1)] = 'male'
    
    
    A_test.Race.loc[(A_test['Race'] == 0)] = 'Amer-Indian-Eskimo'
    A_test.Race.loc[(A_test['Race'] == 1)] = 'Asian-Pac-Islander'
    A_test.Race.loc[(A_test['Race'] == 2)] = 'Black'
    A_test.Race.loc[(A_test['Race'] == 3)] = 'Other'
    A_test.Race.loc[(A_test['Race'] == 4)] = 'White'
    
    
    # Train a classification model
    lr_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)
    lr_predictor.fit(X_train, Y_train)
    
    # (Optional) View this model in Fairlearn's fairness dashboard, and see the disparities which appear:
    from fairlearn.widget import FairlearnDashboard
    FairlearnDashboard(sensitive_features=A_test, 
                       sensitive_feature_names=['Sex', 'Race'],
                       y_true=Y_test,
                       y_pred={"lr_model": lr_predictor.predict(X_test)})
    
  2. Log into Azure Machine Learning and register your model.

    The fairness dashboard can integrate with registered or unregistered models. Register your model in Azure Machine Learning with the following steps:

    from azureml.core import Workspace, Experiment, Model
    import joblib
    import os
    
    ws = Workspace.from_config()
    ws.get_details()
    
    os.makedirs('models', exist_ok=True)
    
    # Function to register models into Azure Machine Learning
    def register_model(name, model):
        print("Registering ", name)
        model_path = "models/{0}.pkl".format(name)
        joblib.dump(value=model, filename=model_path)
        registered_model = Model.register(model_path=model_path,
                                        model_name=name,
                                        workspace=ws)
        print("Registered ", registered_model.id)
        return registered_model.id
    
    # Call the register_model function 
    lr_reg_id = register_model("fairness_linear_regression", lr_predictor)
    
  3. Precompute fairness metrics.

    Create a dashboard dictionary using Fairlearn's metrics package. The _create_group_metric_set method has arguments similar to the Dashboard constructor, except that the sensitive features are passed as a dictionary (to ensure that names are available). We must also specify the type of prediction (binary classification in this case) when calling this method.

    #  Create a dictionary of model(s) you want to assess for fairness 
    sf = { 'Race': A_test.Race, 'Sex': A_test.Sex}
    ys_pred = { lr_reg_id:lr_predictor.predict(X_test) }
    from fairlearn.metrics._group_metric_set import _create_group_metric_set
    
    dash_dict = _create_group_metric_set(y_true=Y_test,
                                        predictions=ys_pred,
                                        sensitive_features=sf,
                                        prediction_type='binary_classification')
    
  4. Upload the precomputed fairness metrics.

    Now, import azureml.contrib.fairness package to perform the upload:

    from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id
    

    Create an Experiment, then a Run, and upload the dashboard to it:

    exp = Experiment(ws, "Test_Fairness_Census_Demo")
    print(exp)
    
    run = exp.start_logging()
    
    # Upload the dashboard to Azure Machine Learning
    try:
        dashboard_title = "Fairness insights of Logistic Regression Classifier"
        # Set validate_model_ids parameter of upload_dashboard_dictionary to False if you have not registered your model(s)
        upload_id = upload_dashboard_dictionary(run,
                                                dash_dict,
                                                dashboard_name=dashboard_title)
        print("\nUploaded to id: {0}\n".format(upload_id))
    
        # To test the dashboard, you can download it back and ensure it contains the right information
        downloaded_dict = download_dashboard_by_upload_id(run, upload_id)
    finally:
        run.complete()
    
  5. Check the fairness dashboard from Azure Machine Learning studio

    If you complete the previous steps (uploading generated fairness insights to Azure Machine Learning), you can view the fairness dashboard in Azure Machine Learning studio. This dashboard is the same visualization dashboard provided in Fairlearn, enabling you to analyze the disparities among your sensitive feature's subgroups (e.g., male vs. female). Follow one of these paths to access the visualization dashboard in Azure Machine Learning studio:

    • Experiments pane (Preview)
    1. Select Experiments in the left pane to see a list of experiments that you've run on Azure Machine Learning.
    2. Select a particular experiment to view all the runs in that experiment.
    3. Select a run, and then the Fairness tab to the explanation visualization dashboard.

    Fairness Dashboard

    • Models pane
    1. If you registered your original model by following the previous steps, you can select Models in the left pane to view it.
    2. Select a model, and then the Fairness tab to view the explanation visualization dashboard.

    To learn more about the visualization dashboard and what it contains, please check out Fairlearn's user guide.

Upload fairness insights for multiple models

If you are interested in comparing multiple models and seeing how their fairness assessments differ, you can pass more than one model to the visualization dashboard and navigate their performance-fairness trade-offs.

  1. Train your models:

    In addition to the previous logistic regression model, we now create a second classifier, based on a Support Vector Machine estimator, and upload a fairness dashboard dictionary using Fairlearn's metrics package. Please note that here we skip the steps for loading and preprocessing data and go straight to the model training stage.

    # Train your first classification model
    from sklearn.linear_model import LogisticRegression
    lr_predictor = LogisticRegression(solver='liblinear', fit_intercept=True)
    lr_predictor.fit(X_train, Y_train)
    
    # Train your second classification model
    from sklearn import svm
    svm_predictor = svm.SVC()
    svm_predictor.fit(X_train, Y_train)
    
  2. Register your models

    Next register both models within Azure Machine Learning. For convenience in subsequent method calls, store the results in a dictionary, which maps the id of the registered model (a string in name:version format) to the predictor itself:

    model_dict = {}
    
    lr_reg_id = register_model("fairness_linear_regression", lr_predictor)
    model_dict[lr_reg_id] = lr_predictor
    
    svm_reg_id = register_model("fairness_svm", svm_predictor)
    model_dict[svm_reg_id] = svm_predictor
    
  3. Load the Fairlearn dashboard locally

    Before uploading the fairness insights into Azure Machine Learning, you can examine these predictions in a locally invoked Fairlearn dashboard.

    #  Generate models' predictions and load the fairness dashboard locally 
    ys_pred = {}
    for n, p in model_dict.items():
        ys_pred[n] = p.predict(X_test)
    
    from fairlearn.widget import FairlearnDashboard
    
    FairlearnDashboard(sensitive_features=A_test, 
                    sensitive_feature_names=['Sex', 'Race'],
                    y_true=Y_test.tolist(),
                    y_pred=ys_pred)
    
  4. Precompute fairness metrics.

    Create a dashboard dictionary using Fairlearn's metrics package.

    sf = { 'Race': A_test.Race, 'Sex': A_test.Sex }
    
    from fairlearn.metrics._group_metric_set import _create_group_metric_set
    
    dash_dict = _create_group_metric_set(y_true=Y_test,
                                        predictions=ys_pred,
                                        sensitive_features=sf,
                                        prediction_type='binary_classification')
    
  5. Upload the precomputed fairness metrics.

    Now, import azureml.contrib.fairness package to perform the upload:

    from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id
    

    Create an Experiment, then a Run, and upload the dashboard to it:

    exp = Experiment(ws, "Compare_Two_Models_Fairness_Census_Demo")
    print(exp)
    
    run = exp.start_logging()
    
    # Upload the dashboard to Azure Machine Learning
    try:
        dashboard_title = "Fairness Assessment of Logistic Regression and SVM Classifiers"
        # Set validate_model_ids parameter of upload_dashboard_dictionary to False if you have not registered your model(s)
        upload_id = upload_dashboard_dictionary(run,
                                                dash_dict,
                                                dashboard_name=dashboard_title)
        print("\nUploaded to id: {0}\n".format(upload_id))
    
        # To test the dashboard, you can download it back and ensure it contains the right information
        downloaded_dict = download_dashboard_by_upload_id(run, upload_id)
    finally:
        run.complete()
    

    Similar to the previous section, you can follow one of the paths described above (via Experiments or Models) in Azure Machine Learning studio to access the visualization dashboard and compare the two models in terms of fairness and performance.

Upload unmitigated and mitigated fairness insights

You can use Fairlearn's mitigation algorithms, compare their generated mitigated model(s) to the original unmitigated model, and navigate the performance/fairness trade-offs among compared models.

In order to see an example that demonstrates the use of the Grid Search mitigation algorithm (which creates a collection of mitigated models with different fairness and performance trade offs) check out this sample notebook.

Uploading multiple models' fairness insights in a single Run would allow for comparison of models with respect to fairness and performance. You can further click on any of the models displayed in the model comparison chart in order to see the detailed fairness insights of the particular model.

Model Comparison Fairlearn Dashboard

Next steps

Learn more about model fairness

Check out Azure Machine Learning Fairness sample notebooks