#@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" }
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
```
%% Cell type:markdown id: tags:
# TensorFlow Fairness Indicators Example Using CelebA Dataset
%% Cell type:markdown id: tags:
<tableclass="tfo-notebook-buttons"align="left">
<td>
<atarget="_blank"href="https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study"><imgsrc="https://www.tensorflow.org/images/tf_logo_32px.png"/>Original Code on TensorFlow.org</a>
</td>
<td>
<atarget="_blank"href="https://colab.research.google.com/github/zkmkarlsruhe/bias-workshop/blob/main/Fairness_Indicators_CelebA_Case_Study.ipynb"><imgsrc="https://www.tensorflow.org/images/colab_logo_32px.png"/>Run in Google Colab</a>
</td>
<td>
<atarget="_blank"href="https://github.com/zkmkarlsruhe/bias-workshop//blob/main/Fairness_Indicators_CelebA_Case_Study.ipynb"><imgsrc="https://www.tensorflow.org/images/GitHub-Mark-32px.png"/>View on GitHub</a>
</td>
</table>
%% Cell type:markdown id: tags:
This notebook explores bias in images using Google’s [Fairness Indicators](https://www.tensorflow.org/responsible_ai/fairness_indicators/guide). In particular, this notebook will:
* Train a simple neural network model to detect a person's smile in images using [`tf.keras`](https://www.tensorflow.org/guide/keras) and the large-scale CelebFaces Attributes ([CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)) dataset.
* Evaluate model performance against a commonly used fairness metric across age groups, using Fairness Indicators.
* Let's you try out the model by taking a selfie.
%% Cell type:markdown id: tags:
# Acknowledgement
We hereby gratefully thank TensorFlow for providing tools and examples to explore the topic of bias in Machine Learning (ML) applications.
We have introduced small changes to the provided notebook in order to suit our needs for this workshop. Please visit the [website](https://www.tensorflow.org/responsible_ai/) to learn more about how TensorFlow gives opportunities to make AI more responsible (including detecting and mitigating bias). Please find the orignal notebook [here](https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study).
Changes to the original code have been developed at [ZKM | Hertz-Lab](https://zkm.de/en/about-the-zkm/organization/hertz-lab) as part of the project [»The Intelligent Museum«](https://zkm.de/en/project/the-intelligent-museum), which is generously funded by the Digital Culture Programme of the [Kulturstiftung des Bundes](https://www.kulturstiftung-des-bundes.de/en)(German Federal Cultural Foundation). Please find other codes developed as part of this project at [intelligent.museum/code](https://intelligent.museum/code)
%% Cell type:markdown id: tags:
# Installation and Import
This notebook was created in [Colaboratory](https://research.google.com/colaboratory/faq.html), connected to the Python 3 Google Compute Engine backend.
We will start by downloading the necessary python packages to get the required data (tensorflow-datasets), train a neural network (tensorflow) and evalute (fairness-indicators / tensorflow-model-analysis). Afterwards, we import specific modules from those librabries.
__Important:__ the very first time you run the pip installs, you may be asked to restart the runtime because of preinstalled out of date packages. Once you do so, the correct packages will be used.
%% Cell type:code id: tags:
```
#@title Pip installs
!pip install -q -U pip==20.2
!pip install -q tensorflow-datasets tensorflow
!pip install fairness-indicators \
"absl-py==0.12.0" \
"apache-beam<3,>=2.28" \
"avro-python3==1.9.1" \
"pyzmq==17.0.0"
```
%% Cell type:code id: tags:
```
#@title Import Modules { display-mode: "form" }
import os
import sys
import tempfile
import urllib
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
import numpy as np
from tensorflow_metadata.proto.v0 import schema_pb2
from tfx_bsl.tfxio import tensor_adapter
from tfx_bsl.tfxio import tf_example_record
import tensorflow_model_analysis as tfma
import fairness_indicators as fi
from google.protobuf import text_format
import apache_beam as beam
#Enable Eager Execution and Print Versions
if tf.__version__ < "2.0.0":
tf.compat.v1.enable_eager_execution()
print("Eager execution enabled.")
else:
print("Eager execution enabled by default.")
print("TensorFlow " + tf.__version__)
print("TFMA " + tfma.VERSION_STRING)
print("TFDS " + tfds.version.__version__)
print("FI " + fi.version.__version__)
```
%% Cell type:markdown id: tags:
# CelebA Dataset
[CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) is a large-scale face attributes dataset with more than 200,000 celebrity images, each with 40 attribute annotations (such as hair type, fashion accessories, facial features, etc.) and 5 landmark locations (eyes, mouth and nose positions). For more details take a look at [the paper](https://liuziwei7.github.io/projects/FaceAttributes.html).
With the permission of the owners, we have stored this dataset on Google Cloud Storage and mostly access it via [TensorFlow Datasets(`tfds`)](https://www.tensorflow.org/datasets).
In this notebook:
* Our model will attempt to classify whether the subject of the image is smiling, as represented by the "Smiling" attribute<sup>*</sup>.
* Images will be resized from 218x178 to 64x64 to reduce the execution time and memory when training.
* Our model's performance will be evaluated across age groups, using the binary "Young" attribute. We will call this "age group" in this notebook.
___
<sup>*</sup> While there is little information available about the labeling methodology for this dataset, we will assume that the "Smiling" attribute was determined by a pleased, kind, or amused expression on the subject's face. For the purpose of this case study, we will take these labels as ground truth.
%% Cell type:code id: tags:
```
#@title Download and prepare Dataset { display-mode: "form" }
Before moving forward, there are several considerations to keep in mind in using CelebA:
* Although in principle this notebook could use any dataset of face images, CelebA was chosen because it contains public domain images of public figures.
* All of the attribute annotations in CelebA are operationalized as binary categories. For example, the "Young" attribute (as determined by the dataset labelers) is denoted as either present or absent in the image.
* CelebA's categorizations do not reflect real human diversity of attributes.
* For the purposes of this notebook, the feature containing the "Young" attribute is referred to as "age group", where the presence of the "Young" attribute in an image is labeled as a member of the "Young" age group and the absence of the "Young" attribute is labeled as a member of the "Not Young" age group. These are assumptions made as this information is not mentioned in the [original paper](http://openaccess.thecvf.com/content_iccv_2015/html/Liu_Deep_Learning_Face_ICCV_2015_paper.html).
* As such, performance in the models trained in this notebook is tied to the ways the attributes have been operationalized and annotated by the authors of CelebA.
* This model should not be used for commercial purposes as that would violate [CelebA's non-commercial research agreement](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html).
%% Cell type:markdown id: tags:
# Defining the Challenge
In this code block will set hyperparameters that highly define the problem we trying to solve.
The value of `IMAGE_SIZE` determines the width and height of the image we are feeding into the neural network. The smaller this number the faster, but also more imprecise our classification algorithm gets.
`LABEL_KEY` determines the attribute we are training our classifier on (e.g. does a person have a mustache, is smiling, wears a hat or not?),
while `GROUP_KEY` defines the groups we are evalutaing on (e.g. male, young, chubby). Keep in mind that does are only binary attributes - the abscence of the male attribute probably denotes the female one.
You can find the 40 different attributes in [this table](https://www.researchgate.net/figure/List-of-the-40-face-attributes-provided-with-the-CelebA-database_tbl1_327029519)
__Note:__ after completing this exercise, feel free to play around with those variables
%% Cell type:code id: tags:
```
IMAGE_SIZE = 64
LABEL_KEY = "Smiling"
GROUP_KEY = "Young"
```
%% Cell type:markdown id: tags:
# Setting Up Input Functions
The subsequent cells will help streamline the input pipeline as well as visualize performance.
%% Cell type:code id: tags:
```
#@title Define Preprocessing and Dataset Functions { display-mode: "form" }
ATTR_KEY = "attributes"
IMAGE_KEY = "image"
def preprocess_input_dict(feat_dict):
# Separate out the image and target variable from the feature dictionary.
# Copy test data locally to be able to read it into tfma
copy_test_files_to_local()
```
%% Cell type:markdown id: tags:
# Build a simple CNN model
In this next block of code we define an Artificial Neural Network with several different layers. Those layers include convolutional filers, pooling and fully-conected layers. We may be able to greatly improve model performance by adding some more complexity (e.g., more densely-connected layers, exploring different activation functions, increasing image size, different acrichtectures, regularization methods, ...), but that may distract from the goal of demonstrating how bias manifests itself in ML models. For that reason, the model will be kept simple — but feel encouraged to explore this space.
We also define a function to set seeds to ensure reproducible results. Note that this colab is meant as an educational tool and does not have the stability of a finely tuned production pipeline. Running without setting a seed may lead to varied results.
%% Cell type:code id: tags:
```
def set_seeds():
np.random.seed(121212)
tf.compat.v1.set_random_seed(212121)
```
%% Cell type:markdown id: tags:
# Fairness Indicators Helper Functions
Before training our model, we define a number of helper functions that will allow us to evaluate the model's performance via Fairness Indicators.
%% Cell type:markdown id: tags:
First, we create a helper function to save our model once we train it.
%% Cell type:code id: tags:
```
#@title Save Model function { display-mode: "form" }
With the model now defined and the input pipeline in place, we’re now ready to train our model. To cut back on the amount of execution time and memory, we will train the model by slicing the data into small batches with only a few repeated iterations.
Note that running this notebook in TensorFlow < 2.0.0 may result in a deprecation warning for `np.where`. Safely ignore this warning as TensorFlow addresses this in 2.X by using `tf.where` in place of `np.where`.
However, performance evaluated across age groups may reveal some shortcomings.
To explore this further, we evaluate the model with Fairness Indicators (via TFMA). In particular, we are interested in seeing whether there is a significant gap in performance between "Young" and "Not Young" categories when evaluated on false positive rate.
A false positive error occurs when the model incorrectly predicts the positive class. In this context, a false positive outcome occurs when the ground truth is an image of a celebrity 'Not Smiling' and the model predicts 'Smiling'. By extension, the false positive rate, which is used in the visualization above, is a measure of accuracy for a test. While this is a relatively mundane error to make in this context, false positive errors can sometimes cause more problematic behaviors. For instance, a false positive error in a spam classifier could cause a user to miss an important email.
As mentioned above, we are concentrating on the false positive rate. The current version of Fairness Indicators (0.1.2) selects false negative rate by default. After running the line below, deselect false_negative_rate and select false_positive_rate to look at the metric we are interested in.
await new Promise((resolve) => capture.onclick = resolve);
const canvas = document.createElement('canvas');
// not a very good crop :D
canvas.width = video.videoHeight;
canvas.height = video.videoHeight;
canvas.getContext('2d').drawImage(video, 0, 0);
stream.getVideoTracks()[0].stop();
div.remove();
return canvas.toDataURL('image/jpeg', quality);
}
''')
display(js)
data=eval_js('takePhoto({})'.format(quality))
binary=b64decode(data.split(',')[1])
withopen(filename,'wb')asf:
f.write(binary)
returnfilename
```
%% Output
Error: IPyKernel not installed into interpreter Python 3.9.5 64-bit:/usr/local/bin/python3
at v.installMissingDependencies (/Users/bethge/.vscode/extensions/ms-toolsai.jupyter-2021.6.832593372/out/client/extension.js:90:244799)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
%% Cell type:markdown id: tags:
Next we define a function that reads the image and processes the image just like the training images. This is a very important step, because neural networks are very sensible when it comes to data that it is very different from what they have seen during training.
%% Cell type:code id: tags:
```
def read_and_process_img(file_path):
# load the raw data from the file as a string
img = tf.io.read_file(file_path)
# convert the compressed string to a 3D uint8 tensor