Commit efe1cdce authored by Paul Bethge's avatar Paul Bethge
Browse files

fix bug

parent 510a6305
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Copyright 2020 The TensorFlow Authors. # Copyright 2020 The TensorFlow Authors.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
#@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" } #@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" }
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# https://www.apache.org/licenses/LICENSE-2.0 # https://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# TensorFlow Fairness Indicators Example Using CelebA Dataset # TensorFlow Fairness Indicators Example Using CelebA Dataset
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<table class="tfo-notebook-buttons" align="left"> <table class="tfo-notebook-buttons" align="left">
<td> <td>
<a target="_blank" href="https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />Original Code on TensorFlow.org</a> <a target="_blank" href="https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />Original Code on TensorFlow.org</a>
</td> </td>
<td> <td>
<a target="_blank" href="https://colab.research.google.com/github/zkmkarlsruhe/bias-workshop/blob/main/Fairness_Indicators_CelebA_Case_Study.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a> <a target="_blank" href="https://colab.research.google.com/github/zkmkarlsruhe/bias-workshop/blob/main/Fairness_Indicators_CelebA_Case_Study.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
</td> </td>
<td> <td>
<a target="_blank" href="https://github.com/zkmkarlsruhe/bias-workshop//blob/main/Fairness_Indicators_CelebA_Case_Study.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View on GitHub</a> <a target="_blank" href="https://github.com/zkmkarlsruhe/bias-workshop//blob/main/Fairness_Indicators_CelebA_Case_Study.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View on GitHub</a>
</td> </td>
</table> </table>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This notebook explores bias in images using Google’s [Fairness Indicators](https://www.tensorflow.org/responsible_ai/fairness_indicators/guide). In particular, this notebook will: This notebook explores bias in images using Google’s [Fairness Indicators](https://www.tensorflow.org/responsible_ai/fairness_indicators/guide). In particular, this notebook will:
* Train a simple neural network model to detect a person's smile in images using [`tf.keras`](https://www.tensorflow.org/guide/keras) and the large-scale CelebFaces Attributes ([CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)) dataset. * Train a simple neural network model to detect a person's smile in images using [`tf.keras`](https://www.tensorflow.org/guide/keras) and the large-scale CelebFaces Attributes ([CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)) dataset.
* Evaluate model performance against a commonly used fairness metric across age groups, using Fairness Indicators. * Evaluate model performance against a commonly used fairness metric across age groups, using Fairness Indicators.
* Let's you try out the model by taking a selfie. * Let's you try out the model by taking a selfie.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Acknowledgement # Acknowledgement
We hereby gratefully thank TensorFlow for providing tools and examples to explore the topic of bias in Machine Learning (ML) applications. We hereby gratefully thank TensorFlow for providing tools and examples to explore the topic of bias in Machine Learning (ML) applications.
We have introduced small changes to the provided notebook in order to suit our needs for this workshop. Please visit the [website](https://www.tensorflow.org/responsible_ai/) to learn more about how TensorFlow gives opportunities to make AI more responsible (including detecting and mitigating bias). Please find the orignal notebook [here](https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study). We have introduced small changes to the provided notebook in order to suit our needs for this workshop. Please visit the [website](https://www.tensorflow.org/responsible_ai/) to learn more about how TensorFlow gives opportunities to make AI more responsible (including detecting and mitigating bias). Please find the orignal notebook [here](https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study).
Changes to the original code have been developed at [ZKM | Hertz-Lab](https://zkm.de/en/about-the-zkm/organization/hertz-lab) as part of the project [»The Intelligent Museum«](https://zkm.de/en/project/the-intelligent-museum), which is generously funded by the Digital Culture Programme of the [Kulturstiftung des Bundes](https://www.kulturstiftung-des-bundes.de/en) (German Federal Cultural Foundation). Please find other codes developed as part of this project at [intelligent.museum/code](https://intelligent.museum/code) Changes to the original code have been developed at [ZKM | Hertz-Lab](https://zkm.de/en/about-the-zkm/organization/hertz-lab) as part of the project [»The Intelligent Museum«](https://zkm.de/en/project/the-intelligent-museum), which is generously funded by the Digital Culture Programme of the [Kulturstiftung des Bundes](https://www.kulturstiftung-des-bundes.de/en) (German Federal Cultural Foundation). Please find other codes developed as part of this project at [intelligent.museum/code](https://intelligent.museum/code)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Installation and Import # Installation and Import
This notebook was created in [Colaboratory](https://research.google.com/colaboratory/faq.html), connected to the Python 3 Google Compute Engine backend. This notebook was created in [Colaboratory](https://research.google.com/colaboratory/faq.html), connected to the Python 3 Google Compute Engine backend.
We will start by downloading the necessary python packages to get the required data (tensorflow-datasets), train a neural network (tensorflow) and evalute (fairness-indicators / tensorflow-model-analysis). Afterwards, we import specific modules from those librabries. We will start by downloading the necessary python packages to get the required data (tensorflow-datasets), train a neural network (tensorflow) and evalute (fairness-indicators / tensorflow-model-analysis). Afterwards, we import specific modules from those librabries.
__Important:__ the very first time you run the pip installs, you may be asked to restart the runtime because of preinstalled out of date packages. Once you do so, the correct packages will be used. __Important:__ the very first time you run the pip installs, you may be asked to restart the runtime because of preinstalled out of date packages. Once you do so, the correct packages will be used.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
#@title Pip installs #@title Pip installs
!pip install -q -U pip==20.2 !pip install -q -U pip==20.2
!pip install -q tensorflow-datasets tensorflow !pip install -q tensorflow-datasets tensorflow
!pip install fairness-indicators \ !pip install fairness-indicators \
"absl-py==0.12.0" \ "absl-py==0.12.0" \
"apache-beam<3,>=2.28" \ "apache-beam<3,>=2.28" \
"avro-python3==1.9.1" \ "avro-python3==1.9.1" \
"pyzmq==17.0.0" "pyzmq==17.0.0"
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
#@title Import Modules { display-mode: "form" } #@title Import Modules { display-mode: "form" }
import os import os
import sys import sys
import tempfile import tempfile
import urllib import urllib
import tensorflow as tf import tensorflow as tf
from tensorflow import keras from tensorflow import keras
import tensorflow_datasets as tfds import tensorflow_datasets as tfds
tfds.disable_progress_bar() tfds.disable_progress_bar()
import numpy as np import numpy as np
from tensorflow_metadata.proto.v0 import schema_pb2 from tensorflow_metadata.proto.v0 import schema_pb2
from tfx_bsl.tfxio import tensor_adapter from tfx_bsl.tfxio import tensor_adapter
from tfx_bsl.tfxio import tf_example_record from tfx_bsl.tfxio import tf_example_record
import tensorflow_model_analysis as tfma import tensorflow_model_analysis as tfma
import fairness_indicators as fi import fairness_indicators as fi
from google.protobuf import text_format from google.protobuf import text_format
import apache_beam as beam import apache_beam as beam
#Enable Eager Execution and Print Versions #Enable Eager Execution and Print Versions
if tf.__version__ < "2.0.0": if tf.__version__ < "2.0.0":
tf.compat.v1.enable_eager_execution() tf.compat.v1.enable_eager_execution()
print("Eager execution enabled.") print("Eager execution enabled.")
else: else:
print("Eager execution enabled by default.") print("Eager execution enabled by default.")
print("TensorFlow " + tf.__version__) print("TensorFlow " + tf.__version__)
print("TFMA " + tfma.VERSION_STRING) print("TFMA " + tfma.VERSION_STRING)
print("TFDS " + tfds.version.__version__) print("TFDS " + tfds.version.__version__)
print("FI " + fi.version.__version__) print("FI " + fi.version.__version__)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# CelebA Dataset # CelebA Dataset
[CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) is a large-scale face attributes dataset with more than 200,000 celebrity images, each with 40 attribute annotations (such as hair type, fashion accessories, facial features, etc.) and 5 landmark locations (eyes, mouth and nose positions). For more details take a look at [the paper](https://liuziwei7.github.io/projects/FaceAttributes.html). [CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) is a large-scale face attributes dataset with more than 200,000 celebrity images, each with 40 attribute annotations (such as hair type, fashion accessories, facial features, etc.) and 5 landmark locations (eyes, mouth and nose positions). For more details take a look at [the paper](https://liuziwei7.github.io/projects/FaceAttributes.html).
With the permission of the owners, we have stored this dataset on Google Cloud Storage and mostly access it via [TensorFlow Datasets(`tfds`)](https://www.tensorflow.org/datasets). With the permission of the owners, we have stored this dataset on Google Cloud Storage and mostly access it via [TensorFlow Datasets(`tfds`)](https://www.tensorflow.org/datasets).
In this notebook: In this notebook:
* Our model will attempt to classify whether the subject of the image is smiling, as represented by the "Smiling" attribute<sup>*</sup>. * Our model will attempt to classify whether the subject of the image is smiling, as represented by the "Smiling" attribute<sup>*</sup>.
* Images will be resized from 218x178 to 64x64 to reduce the execution time and memory when training. * Images will be resized from 218x178 to 64x64 to reduce the execution time and memory when training.
* Our model's performance will be evaluated across age groups, using the binary "Young" attribute. We will call this "age group" in this notebook. * Our model's performance will be evaluated across age groups, using the binary "Young" attribute. We will call this "age group" in this notebook.
___ ___
<sup>*</sup> While there is little information available about the labeling methodology for this dataset, we will assume that the "Smiling" attribute was determined by a pleased, kind, or amused expression on the subject's face. For the purpose of this case study, we will take these labels as ground truth. <sup>*</sup> While there is little information available about the labeling methodology for this dataset, we will assume that the "Smiling" attribute was determined by a pleased, kind, or amused expression on the subject's face. For the purpose of this case study, we will take these labels as ground truth.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
gcs_base_dir = "gs://celeb_a_dataset/" gcs_base_dir = "gs://celeb_a_dataset/"
celeb_a_builder = tfds.builder("celeb_a", data_dir=gcs_base_dir, version='2.0.0') celeb_a_builder = tfds.builder("celeb_a", data_dir=gcs_base_dir, version='2.0.0')
celeb_a_builder.download_and_prepare() celeb_a_builder.download_and_prepare()
num_test_shards_dict = {'0.3.0': 4, '2.0.0': 2} # Used because we download the test dataset separately num_test_shards_dict = {'0.3.0': 4, '2.0.0': 2} # Used because we download the test dataset separately
version = str(celeb_a_builder.info.version) version = str(celeb_a_builder.info.version)
print('Celeb_A dataset version: %s' % version) print('Celeb_A dataset version: %s' % version)
local_root = tempfile.mkdtemp(prefix='test-data') local_root = tempfile.mkdtemp(prefix='test-data')
def local_test_filename_base(): def local_test_filename_base():
return local_root return local_root
def local_test_file_full_prefix(): def local_test_file_full_prefix():
return os.path.join(local_test_filename_base(), "celeb_a-test.tfrecord") return os.path.join(local_test_filename_base(), "celeb_a-test.tfrecord")
def copy_test_files_to_local(): def copy_test_files_to_local():
filename_base = local_test_file_full_prefix() filename_base = local_test_file_full_prefix()
num_test_shards = num_test_shards_dict[version] num_test_shards = num_test_shards_dict[version]
for shard in range(num_test_shards): for shard in range(num_test_shards):
url = "https://storage.googleapis.com/celeb_a_dataset/celeb_a/%s/celeb_a-test.tfrecord-0000%s-of-0000%s" % (version, shard, num_test_shards) url = "https://storage.googleapis.com/celeb_a_dataset/celeb_a/%s/celeb_a-test.tfrecord-0000%s-of-0000%s" % (version, shard, num_test_shards)
filename = "%s-0000%s-of-0000%s" % (filename_base, shard, num_test_shards) filename = "%s-0000%s-of-0000%s" % (filename_base, shard, num_test_shards)
res = urllib.request.urlretrieve(url, filename) res = urllib.request.urlretrieve(url, filename)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Caveats ## Caveats
Before moving forward, there are several considerations to keep in mind in using CelebA: Before moving forward, there are several considerations to keep in mind in using CelebA:
* Although in principle this notebook could use any dataset of face images, CelebA was chosen because it contains public domain images of public figures. * Although in principle this notebook could use any dataset of face images, CelebA was chosen because it contains public domain images of public figures.
* All of the attribute annotations in CelebA are operationalized as binary categories. For example, the "Young" attribute (as determined by the dataset labelers) is denoted as either present or absent in the image. * All of the attribute annotations in CelebA are operationalized as binary categories. For example, the "Young" attribute (as determined by the dataset labelers) is denoted as either present or absent in the image.
* CelebA's categorizations do not reflect real human diversity of attributes. * CelebA's categorizations do not reflect real human diversity of attributes.
* For the purposes of this notebook, the feature containing the "Young" attribute is referred to as "age group", where the presence of the "Young" attribute in an image is labeled as a member of the "Young" age group and the absence of the "Young" attribute is labeled as a member of the "Not Young" age group. These are assumptions made as this information is not mentioned in the [original paper](http://openaccess.thecvf.com/content_iccv_2015/html/Liu_Deep_Learning_Face_ICCV_2015_paper.html). * For the purposes of this notebook, the feature containing the "Young" attribute is referred to as "age group", where the presence of the "Young" attribute in an image is labeled as a member of the "Young" age group and the absence of the "Young" attribute is labeled as a member of the "Not Young" age group. These are assumptions made as this information is not mentioned in the [original paper](http://openaccess.thecvf.com/content_iccv_2015/html/Liu_Deep_Learning_Face_ICCV_2015_paper.html).
* As such, performance in the models trained in this notebook is tied to the ways the attributes have been operationalized and annotated by the authors of CelebA. * As such, performance in the models trained in this notebook is tied to the ways the attributes have been operationalized and annotated by the authors of CelebA.
* This model should not be used for commercial purposes as that would violate [CelebA's non-commercial research agreement](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html). * This model should not be used for commercial purposes as that would violate [CelebA's non-commercial research agreement](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Defining the Challenge # Defining the Challenge
In this code block will set hyperparameters that highly define the problem we trying to solve. In this code block will set hyperparameters that highly define the problem we trying to solve.
The value of `IMAGE_SIZE` determines the width and height of the image we are feeding into the neural network. The smaller this number the faster, but also more imprecise our classification algorithm gets. The value of `IMAGE_SIZE` determines the width and height of the image we are feeding into the neural network. The smaller this number the faster, but also more imprecise our classification algorithm gets.
`LABEL_KEY` determines the attribute we are training our classifier on (e.g. does a person have a mustache, is smiling, wears a hat or not?), `LABEL_KEY` determines the attribute we are training our classifier on (e.g. does a person have a mustache, is smiling, wears a hat or not?),
while `GROUP_KEY` defines the groups we are evalutaing on (e.g. male, young, chubby). Keep in mind that does are only binary attributes - the abscence of the male attribute probably denotes the female one. while `GROUP_KEY` defines the groups we are evalutaing on (e.g. male, young, chubby). Keep in mind that does are only binary attributes - the abscence of the male attribute probably denotes the female one.
You can find the 40 different attributes in [this table](https://www.researchgate.net/figure/List-of-the-40-face-attributes-provided-with-the-CelebA-database_tbl1_327029519) You can find the 40 different attributes in [this table](https://www.researchgate.net/figure/List-of-the-40-face-attributes-provided-with-the-CelebA-database_tbl1_327029519)
__Note:__ after completing this exercise, feel free to play around with those variables __Note:__ after completing this exercise, feel free to play around with those variables
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
IMAGE_SIZE = 64 IMAGE_SIZE = 64
LABEL_KEY = "Smiling" LABEL_KEY = "Smiling"
GROUP_KEY = "Young" GROUP_KEY = "Young"
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Setting Up Input Functions # Setting Up Input Functions
The subsequent cells will help streamline the input pipeline as well as visualize performance. The subsequent cells will help streamline the input pipeline as well as visualize performance.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
#@title Define Preprocessing and Dataset Functions { display-mode: "form" } #@title Define Preprocessing and Dataset Functions { display-mode: "form" }
ATTR_KEY = "attributes" ATTR_KEY = "attributes"
IMAGE_KEY = "image" IMAGE_KEY = "image"
def preprocess_input_dict(feat_dict): def preprocess_input_dict(feat_dict):
# Separate out the image and target variable from the feature dictionary. # Separate out the image and target variable from the feature dictionary.
image = feat_dict[IMAGE_KEY] image = feat_dict[IMAGE_KEY]
label = feat_dict[ATTR_KEY][LABEL_KEY] label = feat_dict[ATTR_KEY][LABEL_KEY]
group = feat_dict[ATTR_KEY][GROUP_KEY] group = feat_dict[ATTR_KEY][GROUP_KEY]
# Resize and normalize image. # Resize and normalize image.
image = tf.cast(image, tf.float32) image = tf.cast(image, tf.float32)
image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE]) image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE])
image /= 255.0 image /= 255.0
# Cast label and group to float32. # Cast label and group to float32.
label = tf.cast(label, tf.float32) label = tf.cast(label, tf.float32)
group = tf.cast(group, tf.float32) group = tf.cast(group, tf.float32)
feat_dict[IMAGE_KEY] = image feat_dict[IMAGE_KEY] = image
feat_dict[ATTR_KEY][LABEL_KEY] = label feat_dict[ATTR_KEY][LABEL_KEY] = label
feat_dict[ATTR_KEY][GROUP_KEY] = group feat_dict[ATTR_KEY][GROUP_KEY] = group
return feat_dict return feat_dict
get_image_and_label = lambda feat_dict: (feat_dict[IMAGE_KEY], feat_dict[ATTR_KEY][LABEL_KEY]) get_image_and_label = lambda feat_dict: (feat_dict[IMAGE_KEY], feat_dict[ATTR_KEY][LABEL_KEY])
get_image_label_and_group = lambda feat_dict: (feat_dict[IMAGE_KEY], feat_dict[ATTR_KEY][LABEL_KEY], feat_dict[ATTR_KEY][GROUP_KEY]) get_image_label_and_group = lambda feat_dict: (feat_dict[IMAGE_KEY], feat_dict[ATTR_KEY][LABEL_KEY], feat_dict[ATTR_KEY][GROUP_KEY])
# Train data returning either 2 or 3 elements (the third element being the group) # Train data returning either 2 or 3 elements (the third element being the group)
def celeb_a_train_data_wo_group(batch_size): def celeb_a_train_data_wo_group(batch_size):
celeb_a_train_data = celeb_a_builder.as_dataset(split='train').shuffle(1024).repeat().batch(batch_size).map(preprocess_input_dict) celeb_a_train_data = celeb_a_builder.as_dataset(split='train').shuffle(1024).repeat().batch(batch_size).map(preprocess_input_dict)
return celeb_a_train_data.map(get_image_and_label) return celeb_a_train_data.map(get_image_and_label)
def celeb_a_train_data_w_group(batch_size): def celeb_a_train_data_w_group(batch_size):
celeb_a_train_data = celeb_a_builder.as_dataset(split='train').shuffle(1024).repeat().batch(batch_size).map(preprocess_input_dict) celeb_a_train_data = celeb_a_builder.as_dataset(split='train').shuffle(1024).repeat().batch(batch_size).map(preprocess_input_dict)
return celeb_a_train_data.map(get_image_label_and_group) return celeb_a_train_data.map(get_image_label_and_group)
# Test data for the overall evaluation # Test data for the overall evaluation
celeb_a_test_data = celeb_a_builder.as_dataset(split='test').batch(1).map(preprocess_input_dict).map(get_image_label_and_group) celeb_a_test_data = celeb_a_builder.as_dataset(split='test').batch(1).map(preprocess_input_dict).map(get_image_label_and_group)
# Copy test data locally to be able to read it into tfma # Copy test data locally to be able to read it into tfma
copy_test_files_to_local() copy_test_files_to_local()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Build a simple CNN model # Build a simple CNN model
In this next block of code we define an Artificial Neural Network with several different layers. Those layers include convolutional filers, pooling and fully-conected layers. We may be able to greatly improve model performance by adding some more complexity (e.g., more densely-connected layers, exploring different activation functions, increasing image size, different acrichtectures, regularization methods, ...), but that may distract from the goal of demonstrating how bias manifests itself in ML models. For that reason, the model will be kept simple — but feel encouraged to explore this space. In this next block of code we define an Artificial Neural Network with several different layers. Those layers include convolutional filers, pooling and fully-conected layers. We may be able to greatly improve model performance by adding some more complexity (e.g., more densely-connected layers, exploring different activation functions, increasing image size, different acrichtectures, regularization methods, ...), but that may distract from the goal of demonstrating how bias manifests itself in ML models. For that reason, the model will be kept simple — but feel encouraged to explore this space.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
from tensorflow.keras import layers from tensorflow.keras import layers
def create_model(): def create_model():
# For this notebook, accuracy will be used to evaluate performance. # For this notebook, accuracy will be used to evaluate performance.
METRICS = [ METRICS = [
tf.keras.metrics.BinaryAccuracy(name='accuracy') tf.keras.metrics.BinaryAccuracy(name='accuracy')
] ]
model = keras.Sequential([ model = keras.Sequential([
layers.InputLayer(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), name='image'), layers.InputLayer(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), name='image'),
layers.Conv2D(16, 3, padding='same', activation='relu'), layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(), layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'), layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(), layers.MaxPooling2D(),
layers.Flatten(), layers.Flatten(),
layers.Dense(64, activation='relu'), layers.Dense(64, activation='relu'),
layers.Dense(1, activation=None) layers.Dense(1, activation=None)
]) ])
model.compile( model.compile(
optimizer=tf.keras.optimizers.Adam(0.001), optimizer=tf.keras.optimizers.Adam(0.001),
loss='hinge', loss='hinge',
metrics=METRICS) metrics=METRICS)
return model return model
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We also define a function to set seeds to ensure reproducible results. Note that this colab is meant as an educational tool and does not have the stability of a finely tuned production pipeline. Running without setting a seed may lead to varied results. We also define a function to set seeds to ensure reproducible results. Note that this colab is meant as an educational tool and does not have the stability of a finely tuned production pipeline. Running without setting a seed may lead to varied results.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
def set_seeds(): def set_seeds():
np.random.seed(121212) np.random.seed(121212)
tf.compat.v1.set_random_seed(212121) tf.compat.v1.set_random_seed(212121)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Fairness Indicators Helper Functions # Fairness Indicators Helper Functions
Before training our model, we define a number of helper functions that will allow us to evaluate the model's performance via Fairness Indicators. Before training our model, we define a number of helper functions that will allow us to evaluate the model's performance via Fairness Indicators.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
First, we create a helper function to save our model once we train it. First, we create a helper function to save our model once we train it.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
#@title Save Model function { display-mode: "form" } #@title Save Model function { display-mode: "form" }
def save_model(model, subdir): def save_model(model, subdir):
base_dir = tempfile.mkdtemp(prefix='saved_models') base_dir = tempfile.mkdtemp(prefix='saved_models')
model_location = os.path.join(base_dir, subdir) model_location = os.path.join(base_dir, subdir)
model.save(model_location, save_format='tf') model.save(model_location, save_format='tf')
return model_location return model_location
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Next, we define functions used to preprocess the data in order to correctly pass it through to TFMA. Next, we define functions used to preprocess the data in order to correctly pass it through to TFMA.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` ```
#@title Data Preprocessing functions { display-mode: "form" } #@title Data Preprocessing functions { display-mode: "form" }
def tfds_filepattern_for_split(dataset_name, split): def tfds_filepattern_for_split(dataset_name, split):
return f"{local_test_file_full_prefix()}*" return f"{local_test_file_full_prefix()}*"
class PreprocessCelebA(object): class PreprocessCelebA(object):
"""Class that deserializes, decodes and applies additional preprocessing for CelebA input.""" """Class that deserializes, decodes and applies additional preprocessing for CelebA input."""
def __init__(self, dataset_name): def __init__(self, dataset_name):
builder = tfds.builder(dataset_name) builder = tfds.builder(dataset_name)
self.features = builder.info.features self.features = builder.info.features
example_specs = self.features.get_serialized_info() example_specs = self.features.get_serialized_info()
self.parser = tfds.core.example_parser.ExampleParser(example_specs) self.parser = tfds.core.example_parser.ExampleParser(example_specs)
def __call__(self, serialized_example): def __call__(self, serialized_example):
# Deserialize # Deserialize
deserialized_example = self.parser.parse_example(serialized_example) deserialized_example = self.parser.parse_example(serialized_example)
# Decode # Decode
decoded_example = self.features.decode_example(deserialized_example) decoded_example = self.features.decode_example(deserialized_example)
# Additional preprocessing # Additional preprocessing
image = decoded_example[IMAGE_KEY] image = decoded_example[IMAGE_KEY]
label = decoded_example[ATTR_KEY][LABEL_KEY] label = decoded_example[ATTR_KEY][LABEL_KEY]
# Resize and scale image. # Resize and scale image.
image = tf.cast(image, tf.float32) image = tf.cast(image, tf.float32)
image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE])