Inside TensorFlow

It’s probably not surprising that Yelp utilizes deep neural networks in its quest to connect people with great local businesses. One example is the selection of photos you see in the Yelp app and website, where neural networks try to identify the best quality photos for the business displayed. A crucial component of our deep learning stack is TensorFlow (TF). In the process of deploying TF to production, we’ve learned a few things that may not be commonly known in the Data Science community.

TensorFlow’s success stems not only from its popularity within the machine learning domain, but also from its design. It’s very well-written and has been extensively tested and documented (you can read the documentation offline by simply cloning its repository). You don’t have to be a machine learning expert to enjoy reading it, and even experienced software engineers can learn a thing or two from it.

Building TensorFlow

You can start using TF without the extra build steps by installing the Python package from pypi.org. Doing it this way is straightforward, but also means you won’t have access to any optimization features. Here’s an example of what this can look like in practice:

$ python3 -c 'import tensorflow as tf; tf.Session().list_devices()' 2>&1 | grep -oE 'Your CPU .*'
Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

If you want to hack TF (the second part of this post explains how), then in order to test your changes, you’ll have to build the package yourself. So, assuming you’re interested in building TF for your own requirements, or perhaps with your own code changes, here’s a compilation of hints on how to make it a relatively painless experience. Note: this is not a step-by-step recipe; obvious points (like “copy the sources”, and “read the documentation”) are not included!

We recommend building TensorFlow inside containers like Docker or Podman. The TF project uses Docker for both continuous integration and official images. You’ll find Dockerfiles and documentation for the latter in the tensorflow/tools/dockerfiles directory. However, it is a Continuous Integration (CI), which is of more interest in the context of building TF, so make sure to read tensorflow/tools/ci_build/README.md and check out other files in this directory. Using containers to build TF makes it easier to consistently install all required packages and helps ensure the builds are reproducible (a critical requirement of CI).

A major required package for building TF is the Bazel Build system (it’s possible, but not recommended, to use make instead of bazel. For instructions see tensorflow/contrib/make/README.md). In addition to Bazel, other TF dependencies can be found inside the configure.py script (in the project root directory). TF also depends on a number of Python packages, all of which are listed inside the tensorflow/tools/pip_package/setup.py file (look for REQUIRED_PACKAGES). Important among those is NumPy, which may require you to install an extra package in the operating system, such as the libatlas3-base package for Ubuntu users. Additionally, if you want to build TF for GPU, you’ll need either CUDA with cuDNN (for NVIDIA) or ROCm (for AMD, which we have not tried) installed inside your container. The simplest way to ensure that all CUDA dependencies are present is to use the official nvidia images as your container base, as demonstrated in the tensorflow/tools/ci_build/Dockerfile.gpu file.

You’ll need to execute configure.py before the actual build. The script will ask many questions, such as “Please specify which C compiler should be used.” For a scripted build, the answer to all questions can be automated with “yes |” (as demonstrated in tensorflow/tools/ci_build/builds/configured). Also, if you read the configure.py source, you’ll quickly discover that individual questions can be suppressed with environment variables, such as HOST_C_COMPILER. Among these, a very useful variable is CC_OPT_FLAGS, which by default contains “-march=native -Wno-sign-compare”. If you want to use the resulting package with a model of CPU different than the one where you run your build, you should replace “native” with a more appropriate value. The output of configure.py is the .tf_configure.bazelrc file, which you may want to look into.

After the initial configuration step, you’ll need to run “bazel build” with options to build TF binaries (but not its Python wheel - yet!). The selection of Bazel options can be a little tricky, but the script tensorflow/tools/ci_build/ci_build.sh may give you some ideas. The build typically takes between 30–60 minutes (or longer when CUDA is enabled) on 40 CPUs - it is quite a large project! After this step is completed, you still need to build the Python wheels. As explained in the documentation, this step is actually performed by the “build_pip_package” binary you’ve just built!

Here’s an example of what the above steps may look in a Dockerfile:

RUN curl -L https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VER}/bazel-${BAZEL_VER}-installer-linux-x86_64.sh --output bazel.sh &&
    bash bazel.sh --prefix=/opt/bazel &&
    rm bazel.sh
ENV PATH ${PATH}:/opt/bazel/bin

RUN curl -L https://github.com/tensorflow/tensorflow/archive/${VERSION}.tar.gz | tar xz --strip-components=1
ENV TF_NEED_CUDA 0
ENV CC_OPT_FLAGS -mtune=intel -march=haswell -Wno-sign-compare
RUN tensorflow/tools/ci_build/builds/configured CPU
RUN cat .tf_configure.bazelrc
RUN bazel build --config=opt  //tensorflow/tools/pip_package:build_pip_package
RUN bazel-bin/tensorflow/tools/pip_package/build_pip_package /tensorflow

This of course implies that you’ll want to actually build TF with a “docker build”. This may seem counterintuitive at first (running Bazel in the context of “build run” will be a more natural choice to some, and in fact will be required for the incremental build), but is actually quite useful as it lets you re-run the build very quickly if no changes have been made, and you don’t have to worry about the build directory. Just remember to “build run” with --user option to copy your Python wheels out of the container image afterwards.

TensorFlow project structure

There are two important top-level directories in the TF project: tensorflow and third_party. The latter contains TF dependencies (which you may want to check out). While the list is rather extensive and some third-party libraries can alternatively be brought in as system dependencies (you may see them inside third_party/systemlibs/syslibs_configure.bzl), our focus is going to be on the tensorflow directory. It may not be immediately apparent, but most of the TF functionality is, at the lowest level, implemented in C++. This is what the tensorflow/core directory is for. Next, this low-level functionality is exported as a public API to various programming languages inside directories named after each language. Most TF users are familiar with the Python API inside the tensorflow/python directory, but there are also subdirectories for C, C++, Java and Go. Knowing your way around the Python subdirectory can help you find useful pieces of information without the need to seek external documentation. For example, to find the constants used by selu activation, you can look in tensorflow/python/keras/activations.py. Another useful Python subdirectory is debug. If you’ve ever wondered what the computation graph of your deep learning model looks like, then file tensorflow/python/debug/README.md is a good start. There are also some very useful tools inside the (you guessed it!) tensorflow/python/tools directory.

Some C++ functions are imported by Python with the SWIG file tensorflow/python/tensorflow.i, which in turn includes *.i files in various subdirectories. As you’ll see, most of these files have an accompanying *.cc with implementation, which in turn include headers from the tensorflow/core directory (and also from the tensorflow/c public API directory). However, SWIG is only used for low-level functions, and TF focuses mostly on high-level operations. These are coded and registered in the tensorflow/core directory as so-called “ops” (look for REGISTER_OP macro; the majority of ops are inside the ops subdirectory). Ops are imported by language APIs using their name. Note that in Python, the spelling of each op is changed, replacing CamelCase with snake_case (for example, ApplyGradientDescent from tensorflow/core/ops/training_ops.cc is imported inside tensorflow/python/training/gradient_descent.py as apply_gradient_descent). Other language APIs refer to ops using the original CamelCase names.

The C++ implementation of each op is coded in the so-called “kernel” (there can be separate kernels for CPU and GPU as demonstrated in tensorflow/core/kernels/fact_op.cc), which is then mapped to an op with a REGISTER_KERNEL_BUILDER macro. Most kernels reside inside the tensorflow/core/kernels directory. For example, ApplyGradientDescent is implemented in tensorflow/core/kernels/training_ops.cc. Unit tests for kernels are written in Python and reside either inside the tensorflow/python/kernel_tests directory or next to their Python API wrapper, in “*_test.py” files. For example, unit tests for ApplyGradientDescent are coded in tensorflow/python/training/training_ops_test.py.

A complete list of ops is available in two locations: the tensorflow/core/api_def directory and the tensorflow/core/ops/ops.pbtxt file. As you can see, TF defines a considerable number of ops which explains the large size of its binary. When building TF, you can minimize its size by enabling only selected ops. This is documented inside the tensorflow/core/framework/selective_registration.h file (note, this is an experimental feature). Interestingly, you don’t need to maintain a fork of TF if you want to add your own custom ops. Instead, TF’s design allows for an external project to extend TF with a new functionality. This is demonstrated in the TensorFlow Addons project.

Finally, you may want to check the content of the tensorflow/core/platform directory. There, you can find files not specific to TensorFlow, but rather low-level operating systems or network protocol functionalities. Files shared by all platforms reside in this directory, but there are also several platform-specific subdirectories. For example, if you’re troubleshooting an S3-related issue, there’s an “S3” subdirectory to help you. This code is very well-written and potentially useful outside of the TF project (but please do check the license first!). Finally, for a high-level overview of the TensorFlow architecture, we recommend you check the official documentation.

We hope you’ll find this collection of hints useful when playing with TensorFlow or deploying it in your machine learning workflow!

Note

Neither Yelp nor the author of this post are affiliated with Google or TensorFlow authors.

Become a Machine Learning Engineer at Yelp

Want to build state of the art machine learning systems at Yelp? Apply to become a Machine Learning Engineer today.

View Job

Back to blog