Commit 510a6305 authored by Paul Bethge's avatar Paul Bethge
Browse files

update num epochs

parent 0b165738
%% Cell type:markdown id: tags:
# Copyright 2020 The TensorFlow Authors.
%% Cell type:code id: tags:
#@title Licensed under the Apache License, Version 2.0 (the "License");
#@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" }
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.
%% Cell type:markdown id: tags:
# TensorFlow Fairness Indicators Example Using CelebA Dataset
%% Cell type:markdown id: tags:
<table class="tfo-notebook-buttons" align="left">
<a target="_blank" href=""><img src="" />Original Code on</a>
<a target="_blank" href=""><img src="" />Run in Google Colab</a>
<a target="_blank" href=""><img src="" />View on GitHub</a>
%% Cell type:markdown id: tags:
This notebook explores bias in images using Google’s [Fairness Indicators]( In particular, this notebook will:
* Train a simple neural network model to detect a person's smile in images using [`tf.keras`]( and the large-scale CelebFaces Attributes ([CelebA]( dataset.
* Evaluate model performance against a commonly used fairness metric across age groups, using Fairness Indicators.
* Let's you try out the model by taking a selfie.
%% Cell type:markdown id: tags:
# Acknowledgement
We hereby gratefully thank TensorFlow for providing tools and examples to explore the topic of bias in Machine Learning (ML) applications.
We have introduced small changes to the provided notebook in order to suit our needs for this workshop. Please visit the [website]( to learn more about how TensorFlow gives opportunities to make AI more responsible (including detecting and mitigating bias). Please find the orignal notebook [here](
Changes to the original code have been developed at [ZKM | Hertz-Lab]( as part of the project [»The Intelligent Museum«](, which is generously funded by the Digital Culture Programme of the [Kulturstiftung des Bundes]( (German Federal Cultural Foundation). Please find other codes developed as part of this project at [](
%% Cell type:markdown id: tags:
# Installation and Import
This notebook was created in [Colaboratory](, connected to the Python 3 Google Compute Engine backend.
We will start by downloading the necessary python packages to get the required data (tensorflow-datasets), train a neural network (tensorflow) and evalute (fairness-indicators / tensorflow-model-analysis). Afterwards, we import specific modules from those librabries.
__Important:__ the very first time you run the pip installs, you may be asked to restart the runtime because of preinstalled out of date packages. Once you do so, the correct packages will be used.
%% Cell type:code id: tags:
#@title Pip installs
!pip install -q -U pip==20.2
!pip install -q tensorflow-datasets tensorflow
!pip install fairness-indicators \
"absl-py==0.12.0" \
"apache-beam<3,>=2.28" \
"avro-python3==1.9.1" \
%% Cell type:code id: tags:
#@title Import Modules { display-mode: "form" }
import os
import sys
import tempfile
import urllib
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
import numpy as np
from tensorflow_metadata.proto.v0 import schema_pb2
from tfx_bsl.tfxio import tensor_adapter
from tfx_bsl.tfxio import tf_example_record
import tensorflow_model_analysis as tfma
import fairness_indicators as fi
from google.protobuf import text_format
import apache_beam as beam
#Enable Eager Execution and Print Versions
if tf.__version__ < "2.0.0":
print("Eager execution enabled.")
print("Eager execution enabled by default.")
print("TensorFlow " + tf.__version__)
print("TFMA " + tfma.VERSION_STRING)
print("TFDS " + tfds.version.__version__)
print("FI " + fi.version.__version__)
%% Cell type:markdown id: tags:
# CelebA Dataset
[CelebA]( is a large-scale face attributes dataset with more than 200,000 celebrity images, each with 40 attribute annotations (such as hair type, fashion accessories, facial features, etc.) and 5 landmark locations (eyes, mouth and nose positions). For more details take a look at [the paper](
With the permission of the owners, we have stored this dataset on Google Cloud Storage and mostly access it via [TensorFlow Datasets(`tfds`)](
In this notebook:
* Our model will attempt to classify whether the subject of the image is smiling, as represented by the "Smiling" attribute<sup>*</sup>.
* Images will be resized from 218x178 to 64x64 to reduce the execution time and memory when training.
* Our model's performance will be evaluated across age groups, using the binary "Young" attribute. We will call this "age group" in this notebook.
<sup>*</sup> While there is little information available about the labeling methodology for this dataset, we will assume that the "Smiling" attribute was determined by a pleased, kind, or amused expression on the subject's face. For the purpose of this case study, we will take these labels as ground truth.
%% Cell type:code id: tags:
gcs_base_dir = "gs://celeb_a_dataset/"
celeb_a_builder = tfds.builder("celeb_a", data_dir=gcs_base_dir, version='2.0.0')
num_test_shards_dict = {'0.3.0': 4, '2.0.0': 2} # Used because we download the test dataset separately
version = str(
print('Celeb_A dataset version: %s' % version)
local_root = tempfile.mkdtemp(prefix='test-data')
def local_test_filename_base():
return local_root
def local_test_file_full_prefix():
return os.path.join(local_test_filename_base(), "celeb_a-test.tfrecord")
def copy_test_files_to_local():
filename_base = local_test_file_full_prefix()
num_test_shards = num_test_shards_dict[version]
for shard in range(num_test_shards):
url = "" % (version, shard, num_test_shards)
filename = "%s-0000%s-of-0000%s" % (filename_base, shard, num_test_shards)
res = urllib.request.urlretrieve(url, filename)
%% Cell type:markdown id: tags:
## Caveats
Before moving forward, there are several considerations to keep in mind in using CelebA:
* Although in principle this notebook could use any dataset of face images, CelebA was chosen because it contains public domain images of public figures.
* All of the attribute annotations in CelebA are operationalized as binary categories. For example, the "Young" attribute (as determined by the dataset labelers) is denoted as either present or absent in the image.
* CelebA's categorizations do not reflect real human diversity of attributes.
* For the purposes of this notebook, the feature containing the "Young" attribute is referred to as "age group", where the presence of the "Young" attribute in an image is labeled as a member of the "Young" age group and the absence of the "Young" attribute is labeled as a member of the "Not Young" age group. These are assumptions made as this information is not mentioned in the [original paper](
* As such, performance in the models trained in this notebook is tied to the ways the attributes have been operationalized and annotated by the authors of CelebA.
* This model should not be used for commercial purposes as that would violate [CelebA's non-commercial research agreement](
%% Cell type:markdown id: tags:
# Defining the Challenge
In this code block will set hyperparameters that highly define the problem we trying to solve.
The value of `IMAGE_SIZE` determines the width and height of the image we are feeding into the neural network. The smaller this number the faster, but also more imprecise our classification algorithm gets.
`LABEL_KEY` determines the attribute we are training our classifier on (e.g. does a person have a mustache, is smiling, wears a hat or not?),
while `GROUP_KEY` defines the groups we are evalutaing on (e.g. male, young, chubby). Keep in mind that does are only binary attributes - the abscence of the male attribute probably denotes the female one.
You can find the 40 different attributes in [this table](
__Note:__ after completing this exercise, feel free to play around with those variables
%% Cell type:code id: tags:
LABEL_KEY = "Smiling"
GROUP_KEY = "Young"
%% Cell type:markdown id: tags:
# Setting Up Input Functions
The subsequent cells will help streamline the input pipeline as well as visualize performance.
%% Cell type:code id: tags:
#@title Define Preprocessing and Dataset Functions { display-mode: "form" }
ATTR_KEY = "attributes"
IMAGE_KEY = "image"
def preprocess_input_dict(feat_dict):
# Separate out the image and target variable from the feature dictionary.
image = feat_dict[IMAGE_KEY]
label = feat_dict[ATTR_KEY][LABEL_KEY]
group = feat_dict[ATTR_KEY][GROUP_KEY]
# Resize and normalize image.
image = tf.cast(image, tf.float32)
image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE])
image /= 255.0
# Cast label and group to float32.
label = tf.cast(label, tf.float32)
group = tf.cast(group, tf.float32)
feat_dict[IMAGE_KEY] = image
feat_dict[ATTR_KEY][LABEL_KEY] = label
feat_dict[ATTR_KEY][GROUP_KEY] = group
return feat_dict
get_image_and_label = lambda feat_dict: (feat_dict[IMAGE_KEY], feat_dict[ATTR_KEY][LABEL_KEY])
get_image_label_and_group = lambda feat_dict: (feat_dict[IMAGE_KEY], feat_dict[ATTR_KEY][LABEL_KEY], feat_dict[ATTR_KEY][GROUP_KEY])
# Train data returning either 2 or 3 elements (the third element being the group)
def celeb_a_train_data_wo_group(batch_size):
celeb_a_train_data = celeb_a_builder.as_dataset(split='train').shuffle(1024).repeat().batch(batch_size).map(preprocess_input_dict)
def celeb_a_train_data_w_group(batch_size):
celeb_a_train_data = celeb_a_builder.as_dataset(split='train').shuffle(1024).repeat().batch(batch_size).map(preprocess_input_dict)
# Test data for the overall evaluation
celeb_a_test_data = celeb_a_builder.as_dataset(split='test').batch(1).map(preprocess_input_dict).map(get_image_label_and_group)
# Copy test data locally to be able to read it into tfma
%% Cell type:markdown id: tags:
# Build a simple CNN model
In this next block of code we define an Artificial Neural Network with several different layers. Those layers include convolutional filers, pooling and fully-conected layers. We may be able to greatly improve model performance by adding some more complexity (e.g., more densely-connected layers, exploring different activation functions, increasing image size, different acrichtectures, regularization methods, ...), but that may distract from the goal of demonstrating how bias manifests itself in ML models. For that reason, the model will be kept simple — but feel encouraged to explore this space.
%% Cell type:code id: tags:
from tensorflow.keras import layers
def create_model():
# For this notebook, accuracy will be used to evaluate performance.
model = keras.Sequential([
layers.InputLayer(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), name='image'),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.Dense(64, activation='relu'),
layers.Dense(1, activation=None)
return model
%% Cell type:markdown id: tags:
We also define a function to set seeds to ensure reproducible results. Note that this colab is meant as an educational tool and does not have the stability of a finely tuned production pipeline. Running without setting a seed may lead to varied results.
%% Cell type:code id: tags:
def set_seeds():
%% Cell type:markdown id: tags:
# Fairness Indicators Helper Functions
Before training our model, we define a number of helper functions that will allow us to evaluate the model's performance via Fairness Indicators.
%% Cell type:markdown id: tags:
First, we create a helper function to save our model once we train it.
%% Cell type:code id: tags:
#@title Save Model function { display-mode: "form" }
def save_model(model, subdir):
base_dir = tempfile.mkdtemp(prefix='saved_models')
model_location = os.path.join(base_dir, subdir), save_format='tf')
return model_location
%% Cell type:markdown id: tags:
Next, we define functions used to preprocess the data in order to correctly pass it through to TFMA.
%% Cell type:code id: tags:
#@title Data Preprocessing functions { display-mode: "form" }
def tfds_filepattern_for_split(dataset_name, split):
return f"{local_test_file_full_prefix()}*"
class PreprocessCelebA(object):
"""Class that deserializes, decodes and applies additional preprocessing for CelebA input."""
def __init__(self, dataset_name):
builder = tfds.builder(dataset_name)
self.features =
example_specs = self.features.get_serialized_info()
self.parser = tfds.core.example_parser.ExampleParser(example_specs)
def __call__(self, serialized_example):
# Deserialize
deserialized_example = self.parser.parse_example(serialized_example)
# Decode
decoded_example = self.features.decode_example(deserialized_example)
# Additional preprocessing
image = decoded_example[IMAGE_KEY]
label = decoded_example[ATTR_KEY][LABEL_KEY]
# Resize and scale image.
image = tf.cast(image, tf.float32)
image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE])
image /= 255.0
image = tf.reshape(image, [-1])
# Cast label and group to float32.
label = tf.cast(label, tf.float32)
group = decoded_example[ATTR_KEY][GROUP_KEY]
output = tf.train.Example()
output.features.feature[GROUP_KEY].bytes_list.value.append(GROUP_KEY if group.numpy() else 'Not ' + GROUP_KEY)
return output.SerializeToString()
def tfds_as_pcollection(beam_pipeline, dataset_name, split):
return (
| 'Read records' >>, split))
| 'Preprocess' >> beam.Map(PreprocessCelebA(dataset_name))
%% Cell type:markdown id: tags:
Finally, we define a function that evaluates the results in TFMA.
%% Cell type:code id: tags:
#@title TFMA Evaluation function { display-mode: "form" }
def get_eval_results(model_location, eval_subdir):
base_dir = tempfile.mkdtemp(prefix='saved_eval_results')
tfma_eval_result_path = os.path.join(base_dir, eval_subdir)
eval_config_pbtxt = """
model_specs {
label_key: "%s"
metrics_specs {
metrics {
class_name: "FairnessIndicators"
config: '{ "thresholds": [0.5] }'
metrics {
class_name: "ExampleCount"
slicing_specs {}
slicing_specs { feature_keys: "%s" }
options {
compute_confidence_intervals { value: False }
disabled_outputs{values: "analysis"}
eval_config = text_format.Parse(eval_config_pbtxt, tfma.EvalConfig())
eval_shared_model = tfma.default_eval_shared_model(
eval_saved_model_path=model_location, tags=[tf.saved_model.SERVING])
schema_pbtxt = """
tensor_representation_group {
key: ""
value {
tensor_representation {
key: "%s"
value {
dense_tensor {
column_name: "%s"
shape {
dim { size: %s }
dim { size: %s }
dim { size: 3 }
feature {
name: "%s"
type: FLOAT
feature {
name: "%s"
type: FLOAT
feature {
name: "%s"
type: BYTES
schema = text_format.Parse(schema_pbtxt, schema_pb2.Schema())
coder = tf_example_record.TFExampleBeamRecord(
physical_format='inmem', schema=schema,
tensor_adapter_config = tensor_adapter.TensorAdapterConfig(
# Run the fairness evaluation.
with beam.Pipeline() as pipeline:
_ = (
tfds_as_pcollection(pipeline, 'celeb_a', 'test')
| 'ExamplesToRecordBatch' >> coder.BeamSource()
| 'ExtractEvaluateAndWriteResults' >>
return tfma.load_eval_result(output_path=tfma_eval_result_path)
%% Cell type:markdown id: tags:
# Train & Evaluate Model
With the model now defined and the input pipeline in place, we’re now ready to train our model. To cut back on the amount of execution time and memory, we will train the model by slicing the data into small batches with only a few repeated iterations.
Note that running this notebook in TensorFlow < 2.0.0 may result in a deprecation warning for `np.where`. Safely ignore this warning as TensorFlow addresses this in 2.X by using `tf.where` in place of `np.where`.
%% Cell type:code id: tags:
# Set seeds to get reproducible results
model_unconstrained = create_model(), epochs=5, steps_per_epoch=1000), epochs=2, steps_per_epoch=1000)
%% Cell type:markdown id: tags:
Evaluating the model on the test data should result in a final accuracy score of just over 90%. Not bad for a simple model with no fine tuning.
%% Cell type:code id: tags:
print('Overall Results, Unconstrained')
celeb_a_test_data = celeb_a_builder.as_dataset(split='test').batch(BATCH_SIZE).map(preprocess_input_dict).map(get_image_label_and_group)
results = model_unconstrained.evaluate(celeb_a_test_data)
%% Cell type:markdown id: tags:
However, performance evaluated across age groups may reveal some shortcomings.
To explore this further, we evaluate the model with Fairness Indicators (via TFMA). In particular, we are interested in seeing whether there is a significant gap in performance between "Young" and "Not Young" categories when evaluated on false positive rate.
A false positive error occurs when the model incorrectly predicts the positive class. In this context, a false positive outcome occurs when the ground truth is an image of a celebrity 'Not Smiling' and the model predicts 'Smiling'. By extension, the false positive rate, which is used in the visualization above, is a measure of accuracy for a test. While this is a relatively mundane error to make in this context, false positive errors can sometimes cause more problematic behaviors. For instance, a false positive error in a spam classifier could cause a user to miss an important email.
%% Cell type:code id: tags:
model_location = save_model(model_unconstrained, 'model_export_unconstrained')
eval_results_unconstrained = get_eval_results(model_location, 'eval_results_unconstrained')
%% Cell type:markdown id: tags:
As mentioned above, we are concentrating on the false positive rate. The current version of Fairness Indicators (0.1.2) selects false negative rate by default. After running the line below, deselect false_negative_rate and select false_positive_rate to look at the metric we are interested in.
%% Cell type:code id: tags:
%% Cell type:markdown id: tags:
As the results show above, we do see a **disproportionate gap between "Young" and "Not Young" categories**.
%% Cell type:markdown id: tags:
# Predict on WebCam image
In this last section we want to give you the ability to eperience the performance of the model yourself.
First, we take a look at some pictures from the test set. If the resulting number is greater than zero the person is smiling.
%% Cell type:code id: tags:
from IPython.display import Image
def interpret_result(result):
if result >= 0.0:
print ("Smiling! :)")
print ("Not Smiling! :(")
# show a few examples from the test set
dataset = celeb_a_builder.as_dataset(split='test').batch(1).map(preprocess_input_dict).map(get_image_label_and_group)
for item in dataset.take(8):
img = item[0]
label = item[1].numpy()
res = model_unconstrained.predict(img)[0]
temp_name = 'img.png'
tf.keras.preprocessing.image.save_img(temp_name, img[0])
display(Image(temp_name, width=100, height=100))
print("predicted:" + str(res))
print("ground truth:" + str(label))
%% Cell type:markdown id: tags:
Now you can try it out by yourself.
First we define a function that will take access your WebCam.
%% Cell type:code id: tags:
#@title Define WebCam Handle { display-mode: "form" }
from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode
def take_photo(filename='photo.jpg', quality=0.8):
js = Javascript('''
async function takePhoto(quality) {
const div = document.createElement('div');
const capture = document.createElement('button');
capture.textContent = 'Capture';
const video = document.createElement('video'); = 'block';
const stream = await navigator.mediaDevices.getUserMedia({video: true});
video.srcObject = stream;
// Resize the output to fit the video element.
google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);
// Wait for Capture to be clicked.
await new Promise((resolve) => capture.onclick = resolve);
const canvas = document.createElement('canvas');
// not a very good crop :D
canvas.width = video.videoHeight;
canvas.height = video.videoHeight;
canvas.getContext('2d').drawImage(video, 0, 0);
return canvas.toDataURL('image/jpeg', quality);
data = eval_js('takePhoto({})'.format(quality))
binary = b64decode(data.split(',')[1])
with open(filename, 'wb') as f:
return filename
%%%% Output: error
Error: IPyKernel not installed into interpreter Python 3.9.5 64-bit:/usr/local/bin/python3
at v.installMissingDependencies (/Users/bethge/.vscode/extensions/ms-toolsai.jupyter-2021.6.832593372/out/client/extension.js:90:244799)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
%% Cell type:markdown id: tags:
Next we define a function that reads the image and processes the image just like the training images. This is a very important step, because neural networks are very sensible when it comes to data that it is very different from what they have seen during training.
%% Cell type:code id: tags:
def read_and_process_img(file_path):
# load the raw data from the file as a string
img =
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_jpeg(img, channels=3)
# resize the image to the desired size
img = tf.cast(img, tf.float32)
img = tf.image.resize(img, [IMAGE_SIZE, IMAGE_SIZE])
# return a normalized image
return img / 255.0
%% Cell type:markdown id: tags:
Now let's take that photo!
__Note:__ Make sure to accept the permissions!!
And don't forget to smile... :)