Module 3: Migrate from Google Cloud NDB to Cloud Datastore

1. Overview

This series of codelabs (self-paced, hands-on tutorials) aims to help Google App Engine (standard environment) developers modernize their apps by guiding them through a series of migrations. The most significant step is to move away from original runtime bundled services because the next generation runtimes are more flexible, giving users a greater variety of service options. Moving to the newer generation runtime enables you to integrate with Google Cloud products more easily, use a wider range of supported services, and support current language releases.

This optional tutorial shows developers how to migrate from Cloud NDB to Cloud Datastore as the client library to talk to the Datastore service. Developers who prefer NDB can stay with it as it's Python 3 compatible, so that's why this migration is optional. This migration is only for those who wish to build a consistent codebase and shared libraries with other apps already using Cloud Datastore. This is explained in the "Background" section.

You'll learn how to

  • Use Cloud NDB (if you're unfamiliar with it)
  • Migrate from Cloud NDB to Cloud Datastore
  • Further migrate your app to Python 3

What you'll need

  • A Google Cloud Platform project with an active GCP billing account
  • Basic Python skills
  • Working knowledge of basic Linux commands
  • Basic knowledge of developing and deploying App Engine apps
  • A working Module 2 App Engine 2.x or 3.x app.

Survey

How will you use this codelab?

Only read through it Read it and complete the exercises

2. Background

While Cloud NDB is a great Datastore solution for long-time App Engine developers and helps with transitioning to Python 3, it is not the only way App Engine developers can access Datastore. When App Engine's Datastore became its own product in 2013, Google Cloud Datastore, a new client library was created so that all users can use Datastore.

Python 3 App Engine and non-App Engine developers are directed to use Cloud Datastore (not Cloud NDB). Python 2 App Engine developers are encouraged to migrate from ndb to Cloud NDB and port to Python 3 from there but they can also choose to further migrate to Cloud Datastore as well. This is a logical decision especially for developers who already have code using Cloud Datastore, such as the ones just mentioned, and wish to create shared libraries across all their applications. Code reuse is a best practice as is code consistency, and both contribute to overall reduced maintenence cost, as summarized here:

Migration from Cloud NDB to Cloud Datastore

  • Allows developers to focus on a single codebase for Datastore access
  • Avoids maintaining some code using Cloud NDB and others using Cloud Datastore
  • Provides more consistency in codebase and better code reuseability
  • Enables use of common/shared libraries, which contribute to lower overall maintenance cost

This migration features these primary steps:

  1. Setup/Prework
  2. Replace Cloud NDB with Cloud Datastore client libraries
  3. Update application

3. Setup/Prework

Before we get going with the main part of the tutorial, let's set up our project, get the code, then deploy the baseline app so we know we started with working code.

1. Setup project

If you completed the Module 2 codelab, we recommend reusing that same project (and code). Alternatively, you can create a brand new project or reuse another existing project. Ensure the project has an active billing account and App Engine (app) is enabled.

2. Get baseline sample app

One of the prerequisites is to have a working Module 2 sample app. Use your solution if you completed that tutorial. You can complete it now (link above), or if you wish to skip it, then copy the Module 2 repo (link below).

Whether you use yours or ours, the Module 2 code is where we'll START. This Module 3 codelab walks you through each step, and when complete, it should resemble code at the FINISH point. There are Python 2 and 3 versions of this tutorial, so grab the correct code repo below.

Python 2

The directory of Python 2 Module 2 STARTing files (yours or ours) should look like this:

$ ls
README.md               appengine_config.py     requirements.txt
app.yaml                main.py                 templates

If you completed the Module 2 tutorial, you'll also have a lib folder with Flask and its dependencies. If you don't have a lib folder, create it with the pip install -t lib -r requirements.txt command so that we can deploy this baseline app in the next step. If you have both Python 2 and 3 installed, we recommend using pip2 instead of pip to avoid confusion with Python 3.

Python 3

The directory of Python 3 Module 2 STARTing files (yours or ours) should look like this:

$ ls
README.md               main.py                 templates
app.yaml                requirements.txt

Neither lib nor appengine_config.py are used for Python 3.

3. (Re)Deploy Module 2 app

Your remaining prework steps to execute now:

  1. Re-familiarize yourself with the gcloud command-line tool (if nec.)
  2. (Re)deploy the Module 1 code to App Engine (if nec.)

Once you've successfully executed those steps and confirm it's operational, we'll move ahead in this tutorial, starting with the configuration files.

4. Replace Cloud NDB with Cloud Datastore client libraries

The only configuration change is a minor package swap in your requirements.txt file.

1. Update requirements.txt

Upon completing Module 2, your requirements.txt file looked like this:

  • BEFORE (Python 2 and 3):
Flask==1.1.2
google-cloud-ndb==1.7.1

Update requirements.txt by replacing the Cloud NDB library (google-cloud-ndb) with the latest version of the Cloud Datastore library (google-cloud-datastore), leaving the entry for Flask intact, bearing in mind the final version of Cloud Datastore that's Python 2 compatible is 1.15.3:

  • AFTER (Python 2):
Flask==1.1.2
google-cloud-datastore==1.15.3
  • AFTER (Python 3):
Flask==1.1.2
google-cloud-datastore==2.1.0

Keep in mind that the repo is maintained more regularly than this tutorial, so it's possible the requirements.txt file might reflect newer versions. We recommend using the latest versions of each library, but if they don't work, you can roll back to an older release. The versions numbers above are the latest when this codelab was last updated.

2. Other configuration files

The other configuration files, app.yaml and appengine_config.py, should remain unchanged from the previous migration step:

  • app.yaml should (still) reference the 3rd-party bundled packages grpcio and setuptools.
  • appengine_config.py should (still) point pkg_resources and google.appengine.ext.vendor to the 3rd-party resources in lib.

Now let's move to the application files.

5. Update application files

There are no changes to template/index.html, but there are a few updates for main.py.

1. Imports

The starting code for the import section should look as follows:

  • BEFORE:
from flask import Flask, render_template, request
from google.cloud import ndb

Replace the google.cloud.ndb import with one for Cloud Datastore: google.cloud.datastore. Because the Datastore client library does not support auto-creation of a timestamp field in an Entity, also import the standard library datetime module to create one manually. By convention, standard library imports go above third-party package imports. When you're done with these changes, it should look like this:

  • AFTER:
from datetime import datetime
from flask import Flask, render_template, request
from google.cloud import datastore

2. Initialization and data model

After initializing Flask, the Module 2 sample app creating an NDB data model class and its fields lok as follows:

  • BEFORE:
app = Flask(__name__)
ds_client = ndb.Client()

class Visit(ndb.Model):
    visitor   = ndb.StringProperty()
    timestamp = ndb.DateTimeProperty(auto_now_add=True)

The Cloud Datastore library does not have such a class, so delete the Visit class declaration. You still need a client to talk to Datastore, so change ndb.Client() to datastore.Client(). The Datastore library is more "flexible," allowing you to create Entities without "pre-declaring" their structure like NDB. After this update, this part of main.py should look like:

  • AFTER:
app = Flask(__name__)
ds_client = datastore.Client()

3. Datastore access

Migrating to Cloud Datastore requires changing how you create, store, and query Datastore entites (at the user-level). For your applications, the difficulty of this migration depends on how complex your Datastore code is. In our sample app, we attempted to make the update as straightforward as possible. Here is our starting code:

  • BEFORE:
def store_visit(remote_addr, user_agent):
    with ds_client.context():
        Visit(visitor='{}: {}'.format(remote_addr, user_agent)).put()

def fetch_visits(limit):
    with ds_client.context():
        return (v.to_dict() for v in Visit.query().order(
                -Visit.timestamp).fetch_page(limit)[0])

With Cloud Datastore, create a generic entity, identifying grouped objects in your Entity with a "key". Create the data record with a JSON object (Python dict) of key-value pairs, then write it to Datastore with the expected put(). Querying is similar but more straightforward with Datastore. Here you can see how the equivalent Datastore code differs:

  • AFTER:
def store_visit(remote_addr, user_agent):
    entity = datastore.Entity(key=ds_client.key('Visit'))
    entity.update({
        'timestamp': datetime.now(),
        'visitor': '{}: {}'.format(remote_addr, user_agent),
    })
    ds_client.put(entity)

def fetch_visits(limit):
    query = ds_client.query(kind='Visit')
    query.order = ['-timestamp']
    return query.fetch(limit=limit)

Update the function bodies for store_visit() and fetch_visits() as above, keeping their signatures identical to the previous version. There are no changes at all to the main handler root(). After completing these changes, your app is now outfitted to use Cloud Datastore and ready to test.

6. Summary/Cleanup

Deploy application

Re-deploy your app with gcloud app deploy, and confirm the app works. Your code should now match what's in the Module 3 repo folders:

If you jumped into this series without doing any of the preceding codelabs, the app itself doesn't change; it registers all visits to the main web page (/) and looks like this once you've visited the site enough times:

visitme app

Congrats for completing this Module 3 codelab. You now know that you can use both the Cloud NDB and Cloud Datastore client libraries to access Datastore. By migrating to the latter, you can now get the benefits is shared libraries, common code and code reuse for consistency and reduced cost of maintenance.

Optional: Clean up

What about cleaning up to avoid being billed until you're ready to move onto the next migration codelab? As existing developers, you're likely already up-to-speed on App Engine's pricing information.

Optional: Disable app

If you're not ready to go to the next tutorial yet, disable your app to avoid incurring charges. When you're ready to move onto the next codelab, you can re-enable it. While your app is disabled, it won't get any traffic to incur charges, however another thing you can get billed for is your Datastore usage if it exceeds the free quota, so delete enough to fall under that limit.

On the other hand, if you're not going to continue with migrations and want to delete everything completely, you can shutdown your project.

Next steps

From here, feel free to explore these next migration modules:

  • Module 3 Bonus: Continue to the bonus section to learn how to port to Python 3 and the next generation App Engine runtime.
  • Module 7: App Engine Push Task Queues (required if you use [push] Task Queues)
    • Adds App Engine taskqueue push tasks to Module 1 app
    • Prepares users for migrating to Cloud Tasks in Module 8
  • Module 4: Migrate to Cloud Run with Docker
    • Containerize your app to run on Cloud Run with Docker
    • Allows you to stay on Python 2
  • Module 5: Migrate to Cloud Run with Cloud Buildpacks
    • Containerize your app to run on Cloud Run with Cloud Buildpacks
    • Do not need to know anything about Docker, containers, or Dockerfiles
    • Requires you to have already migrated your app to Python 3
  • Module 6: Migrate to Cloud Firestore
    • Migrate to Cloud Firestore to access Firebase features
    • While Cloud Firestore supports Python 2, this codelab is available only in Python 3.

7. BONUS: Migrate to Python 3

To access the latest App Engine runtime and features, we recommend that you migrate to Python 3. In our sample app, Datastore was the only built-in service we used, and since we've migrated from ndb to Cloud NDB, we can now port to App Engine's Python 3 runtime.

Overview

While porting to Python 3 is not within the scope of a Google Cloud tutorial, this part of the codelab gives developers an idea of how the Python 3 App Engine runtime differs. One outstanding feature of the next-gen runtime is simplified access to third-party packages: There's no need to specify built-in packages in app.yaml nor a requirement to copy or upload non-built-in libraries; they are implicitly installed from being listed in requirements.txt.

Because our sample is so basic and Cloud Datastore is Python 2-3 compatible, no application code needs to be explicitly ported to 3.x: The app runs on 2.x and 3.x unmodified, meaning the only required changes are in configuration in this case:

  1. Simplify app.yaml to reference Python 3 and remove reference to bundled 3rd-party libraries.
  2. Delete appengine_config.py and the lib folder as they're no longer necessary.

The main.py and templates/index.html application files remain unchanged.

Update requirements.txt

The final version of the Cloud Datastore supporting Python 2 is 1.15.3. Update requirements.txt by with the latest version for Python 3 (may be newer by now). When this tutorial was written, the latest version was 2.1.0, so edit that line to look like this (or whatever the latest version is):

google-cloud-datastore==2.1.0

Simplify app.yaml

BEFORE:

The only real change for this sample app is to significantly shorten app.yaml. As a reminder, here's what we had in app.yaml at the conclusion of Module 3:

runtime: python27
threadsafe: yes
api_version: 1

handlers:
- url: /.*
  script: main.app

libraries:
- name: grpcio
  version: 1.0.0
- name: setuptools
  version: 36.6.0

AFTER:

In Python 3, the threadsafe, api_version, and libraries directives are all deprecated; all apps are presumed threadsafe and api_version isn't used in Python 3. There are no longer built-in third-party packages preinstalled on App Engine services, so libraries is also deprecated. Check the documentation on changes to app.yaml for more information on these changes. As a result, you should delete all three from app.yaml and update to a supported Python 3 version (see below).

Optional: Use of handlers directive

In addition, the handlers directive, which directs traffic at App Engine applications has also been deprecated. Since the next-gen runtime expects web frameworks to manage app routing, all "handler scripts" must be changed to "auto". Combining the changes from above, you arrive at this app.yaml:

runtime: python38

handlers:
- url: /.*
  script: auto

Learn more about script: auto from the app.yaml reference page.

Removing handlers directive

Since handlers is deprecated, you can remove the entire section too, leaving a single-line app.yaml:

runtime: python38

By default, this will launch the Gunicorn WSGI web server which is available for all applications. If you're familiar with gunicorn, this is the command executed when it's started by default with the barebones app.yaml:

gunicorn main:app --workers 2 -c /config/gunicorn.py

Optional: Use of entrypoint directive

If, however, your application requires a specific start-up command, that can be specified with an entrypoint directive, resulting in an app.yaml that looks like this:

runtime: python38
entrypoint: python main.py

This example specifically requests the Flask development server be used instead of gunicorn. Code that starts the development server must also be added to your app to launch on the 0.0.0.0 interface on port 8080 by adding this small section to the bottom of main.py:

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080, debug=True)

Learn more about entrypoint from the app.yaml reference page. More examples and best practices can be found in the App Engine standard environment startup docs as well as the App Engine flexible environment startup docs.

Delete appengine_config.py and lib

Delete the appengine_config.py file and the lib folder. In migrating to Python 3, App Engine acquires and installs packages listed in requirements.txt.

The appengine_config.py config file is used to recognize third-party libraries/packages, whether you've copied them yourself or use ones already available on App Engine servers (built-in). When moving to Python 3, a summary of the big changes are:

  1. No bundling of copied third-party libraries (listed in requirements.txt)
  2. No pip install into a lib folder, meaning no lib folder period
  3. No listing built-in third-party libraries in app.yaml
  4. No need to reference app to third-party libraries, so no appengine_config.py file

Listing all required third-party libraries in requirements.txt is all that's needed.

Deploy application

Re-deploy your app to ensure that it works. You can also confirm how close your solution is to the Module 3 sample Python 3 code. To visualize the differences with Python 2, compare the code with its Python 2 version.

Congrats on finishing the bonus step in Module 3! Visit the documentation on preparing configuration files for the Python 3 runtime. Finally, review the earlier summary above for next steps and cleanup.

Preparing your application

When it is time to migrate your application, you will have to port your main.py and other application files to 3.x, so a best practice is to try your best to make your 2.x application as "forward-compatible" as possible.

There are plenty of online resources to help you accomplish that, but some of the key tips:

  1. Ensure all application dependences are fully 3.x-compatible
  2. Ensure your application runs on at least 2.6 (preferably 2.7)
  3. Ensure application passes entire test suite (and minimum 80% coverage)
  4. Use compatibility libraries such as six, Future, and/or Modernize
  5. Educate yourself on key backwards-incompatible 2.x vs. 3.x differences
  6. Any I/O will likely lead to Unicode vs. byte string incompatibilities

The sample app was designed with all this in mind, hence why the app runs on 2.x and 3.x right out of the box so we can focus on showing you what needs to be changed in order to use the next-gen platform.

8. Additional resources

App Engine migration module codelabs issues/feedback

If you find any issues with this codelab, please search for your issue first before filing. Links to search and create new issues:

Migration resources

Links to the repo folders for Module 2 (START) and Module 3 (FINISH) can be found in the table below. They can also be accessed from the repo for all App Engine migrations which you can clone or download a ZIP file.

Codelab

Python 2

Python 3

Module 2

code

code

Module 3

code

code

App Engine resources

Below are additional resources regarding this specific migration: