Case Study for Chapter 4, Expecting the Unexpected

This section will fill in some details of the web service that implements the classification. Recall from Chapter One, the objective here is to build a simple classifier before building a more realistic and sophisticated classifier. In the long run, a clever production recommendation service can be a service offering or -- better yet -- a platform on which companies build service offerings.

We need to control access to the features of this service. If we're going to make money, we'll need to be sure only properly authenticated users are performing properly authorized actions.

User Authentication identifies who the user is. Until a user is authenticated, nothing else can be done.

User Authorization defines what the user is permitted to do. In our case study, there are two roles, keeping things relatively simple.

Context View

The role of "User" in the Context Diagram from Chapter One is -- at this point -- less than ideal. It was tolerable as an initial description of the interfaces to the application. At this point, it has become more clear that a term like "Researcher" might be a batter dscription for someone researching a sample and looking for a classification.

Here's an expanded context diagram with this new consideration of authenticating users.

uml diagram

We've added an additional use case, Authenticates, for both classes of users. Once Authenticated, then only specifically authorized actions will be permitted.

Development View

When working with the Flask framework, the development view can help to visualize the components that we'll need to build.

uml diagram

This diagram decomposes the application into three parts:

In this section, we'll focus on the view functions.

In order to implement these functions, we'll need to introduce some additional classes that aren't a proper part of the problem domain. The idea of a User, Botanist, Researcher doesn't have much direct connection with the TrainingData, Sample, Hyperparameter, or Distance classes.

A great deal of this processing is readily available in the flask package. We'll examine a logical of the necessary Flask components.

Logical View

The following diagram shows how the Flask object has several view functions. We've shown the functions using a conventional UML class rectangle with only two sections, but a stereotype of «function» to clarify this isn't a class.

Each view function shares a common authenticate decorator. This, too, is shown in a class-like rectangle, even though it's a Python function. The decorator relies on the class Users which is a collection of User instances.

uml diagram

The Authenticate class has a UML stereotype of «decorator» to remind us this is a function used to add a feature to view functions.

We've introduced a Role as an enumeration of distinct values that summarize actions available to a user. Each of the view functions will check to see which Role value the user has. Users without the required Role value will receive an error.

User and Identities

We'll handle user authentication with password validation, and avoid the more secure (and more complex) schemes like OAuth. When working with passwords, it's important to only store hashes of passwords. The werkzeug.security package provides a secure password validation mechanism.

There are important security rules related to passwords:

These rules can never be violated safely. There is no reversible encryption that is safe from a disgruntled employee. We've emphasized this strongly because it is a surprisingly common vulnerability.

One more important point on the User class. We need to separate our notion of "user" from their various identities around the internet. Only allowing for a single email address is a potential problem.

Next, we'll look at some code for the User class. Each person will be represented by an instance of this class.

The Data Model

We'll start by defining the roles a user can play. These correspond to the actor classes shown on the context diagram in Chapter One.

Here's the enumeration of actor roles in the Role class:


from enum import Enum

class Role(str, Enum):
    UNDEFINED = ""
    BOTANIST = "botanist"
    RESEARCHER = "researcher"

This lets us assign a string that maps to one of these enumerated values. The Role class object implements a handy lookup to convert strings to enumerated Role values.


>>> from classifier import Role

>>> Role("botanist")
<Role.BOTANIST: 'botanist'>
>>> Role("invalid")
Traceback (most recent call last):
...
ValueError: 'invalid' is not a valid Role

This provides a handy, bounded list of valid values, collected into a namespace named Role.

The User class defines an individual User of our classifier application.


class User:

    headers = ["username", "email", "real_name", "role", "password"]

    def __init__(
        self,
        username: str,
        email: str,
        real_name: str,
        role: Role,
        password: Optional[str] = None,
    ) -> None:
        self.username = username
        self.email = email
        self.real_name = real_name
        self.role = role
        self.password = password

    def set_password(self, plain_text: str) -> None:
        self.password = werkzeug.security.generate_password_hash(plain_text)

    def valid_password(self, plain_text: str) -> bool:
        return werkzeug.security.check_password_hash(
            self.password or "md5$$", plain_text
        )

    def __repr__(self) -> str:
        return (
            f"{self.__class__.__name__}("
            f"username={self.username!r}, "
            f"email={self.email!r}, "
            f"real_name={self.real_name!r}, "
            f"role={self.role.value!r}, "
            f"password={self.password!r})"
        )

The headers attribute for the class is a handy list of the column titles required in a CSV file of rows that can create User instances. We'll return to this, below.

The set_password() and valid_password() methods work with password hashes. See the important security notes above: it's unsafe to store any reversible password encryption. We use the werkzeug.security module to create the secure hashes we save for each user. We also use the werkzeug.security module to compare an existing hash with a candidate password to see if the hashes match.

In addition to working with users, it can help to see how we acquire the initial list of users to work with. Larger applications may use databases to help coordinate work among multiple servers. For smaller applications, a shared file will work out very nicely.

Reading Users from CSV Files

Reading and writing the Users is something we need to defer to Chapter Nine, where we talk in detail about serialization techniques.

Here are the other methods of the User class to convert from CSV dictionaries to objects and from objects back to a dictionary suitable for CSV writing.


    @staticmethod
    def from_dict(csv_row: Dict[str, str]) -> "User":
        return User(
            username=csv_row["username"],
            email=csv_row["email"],
            real_name=csv_row["real_name"],
            role=Role(csv_row["role"]),
            password=csv_row["password"],
        )

    def __eq__(self, other: Any) -> bool:
        other = cast(User, other)
        return all(
            [
                self.username == other.username,
                self.email == other.email,
                self.real_name == other.real_name,
                self.role == other.role,
            ]
        )

    def asdict(self):
        return {
            "username": self.username,
            "email": self.email,
            "real_name": self.real_name,
            "role": self.role.value,
            "password": self.password,
        }

For now, we'll provide a common template for creating objects from CSV source data.

The Users collection will use these methods to build the User instances. We'll look at the Users collection class next.

Users Collection

Individual User objects are collected into a Flask extension class, Users. This class has an impotant Flask-specific feature: the init_app() method. This method binds the extension to the current Flask server; it provides configuration information that the extension can use. In this case, the Flask application's configuration can provide the USER_FILE parameter to name the file of users to load.

Our Users class is an example of Design by Composition. We've wrapped a dictionary into a class that offers some extra methods. The internal dictionary is a mapping from user name to User instance. The extra methods load User instances from a CSV file.


class Users:
    def __init__(self, init: Optional[Dict[str, User]] = None) -> None:
        self.users = init or {}
        self.anonymous = User("", "", "", Role.UNDEFINED)
        self.app: Optional[Flask] = None

    def init_app(self, app: Flask) -> None:
        self.app = app
        self.app.config.setdefault("USER_FILE", Path("users.csv"))

    def get_user(self, name: str, default: Optional[User] = None) -> User:
        if not self.app:
            raise RuntimeError("Users not bound to an app")
        if not self.users:
            self.load()
        return self.users.get(name, default or self.anonymous)

    def load(self) -> None:
        """Load file when needed."""
        if not self.app:
            raise RuntimeError("Users not bound to an app")
        with self.app.config["USER_FILE"].open() as user_file:
            row_iter = csv.DictReader(user_file)
            user_iter = (User.from_dict(row) for row in row_iter if row)
            self.users = {user.username: user for user in user_iter}

    def save(self) -> None:
        if not self.app:
            raise RuntimeError("Users not bound to an app")
        with self.app.config["USER_FILE"].open("w", newline="") as user_file:
            writer = csv.DictWriter(user_file, User.headers)
            writer.writeheader()
            writer.writerows(u.asdict() for u in self.users.values())

The get_user() method uses a technique called "lazy loading". This method get the data from a file after a request comes in that needs validation. We can expand this technique to check file timestamps and reload the file only after it changes. A number of web servers can then share a single file, reloading it after the file changes.

Now that we've defined a single User and the collection in the Users class. we can look at the details of authentication for a given request.

Authentication

The first part of the authentication design is the exception we'll raise if there's a problem.

class NotAuthorized(Exception):
    status_code = 401

    def __init__(
        self,
        message: str,
        status_code: Optional[int] = None,
        payload: Optional[Dict[str, str]] = None,
    ) -> None:
        super().__init__(message)
        self.message = message
        if status_code is not None:
            self.status_code = status_code
        self.payload = payload

    def to_dict(self) -> Dict[str, Any]:
        rv: Dict[str, Any] = dict(self.payload or ())
        rv["message"] = self.message
        return rv

This is a common design for customized exception classes in a Flask context. We've provided a defaut status code. The object initialization requires a message. It can make use of a status code override, and a payload of additional details.

The super().__init__(message) expression passes the message string to the superclass, Exception, to initialize the arguments there. This makes sure that the object has all of the behaviors of the built-in Exception class.

The second part of the authentication design is the @authenticate decorator function. A decorator function transforms a function into a new function. In this case, we want to expand the application's view functions with a standardized authentication check.


def authenticate(view_function: Callable[..., Response]) -> Callable[..., Response]:
    @wraps(view_function)
    def decorated_function(*args: str) -> Response:
        auth_body = request.headers.get("Authorization", "").split(" ")
        auth_type, credentials = auth_body if len(auth_body) == 2 else ("", ":")
        username, _, password = (
            base64.b64decode(credentials).decode("utf-8").partition(":")
        )
        g.user = users.get_user(username)
        conditions = [
            auth_type.upper() == "BASIC",
            g.user.valid_password(password),
        ]
        if not all(conditions):
            raise NotAuthorized("Unknown User")
        return view_function(*args)

    return decorated_function

The purpose of a decorator is to create a new function, decorated_function() that uses the existing function, view_function(). The new function can pre-process parameters or post-process return values. In this case, the decorated function will examine the request headers to see if the Authorization header reflects a known user and their credentials.

As with the werkzeug.security.check_password_hash() function, we want to do the processing in as consistent a way as possible. This idea is to make sure the application behaves similarly for an unkown user, or a known user with a faulty password.

The type hints for the view function are describe view functions using Callable[..., Response]. This syntax describes any function with arbitrary arguments that creates a Flask Response object.

Flask version 1.1.2, used for this book, does not have type hints. The make_response() function of Flask can deal with at least seven kinds of return values from a view function. To be a little more precise, we could use the following definition.

ReturnValue = Union[
    str, bytes, Dict[str, Any],
    Tuple[...], Response, Callable[..., List[str]]
]

Our application will not make use of all of these very general features. Indeed, we're going to rely on Flask's jsonify() function to create Response documents. This lets us claim that all view functions in our application are Callable[..., Respnse].

Now that we have a authentication decorator, we can look at the Flask application, and use the decorator to create view functions.

The Flask Application

The Flask object is the application. We use this application object to define a number of web application features. We'll start with a customized error handler.


app = Flask(__name__)
app.config.from_object(Demo)  # os.environ["CLASSIFIER_CONFIG"]
users = Users()
users.init_app(app)


@app.errorhandler(NotAuthorized)
def handle_unauthorized(error: NotAuthorized) -> Response:
    response = cast(Response, jsonify(error.to_dict()))
    response.status_code = error.status_code
    return response

The app object is the Flask application. We immediately provide a configuration. We'll look at the configuration objects below. They're classes that provide the kinds of definitions that might change between development and production.

The Users collection will contain all of the user account information. We use the Users.init_app() to bind the collection to the current application. This allows the class to have access to the application's configuration.

The @app.errorhandler decorator binds an error-handling function to a specific subclass of exceptions. In this case, any NotAuthorized exceptions will be specially handled by this function. This function transforms the exception in a conventional Response object.

There are a number of other exceptions we can consider providing.

As we implement these other exceptions, we should consider creating a hierarchy of special exception classes. We can then have a single handler for all of our application-specific exceptions.

Here's our first view function:

@app.route("/health")
def user_list() -> Response:
    # Be sure the users database gets loaded.
    users.get_user("")
    response = {"status": "OK", "user_count": len(users)}
    if app.config["TESTING"]:
        response["users"] = [u.asdict() for u in users.values()]
    return cast(Response, jsonify(response))

This is a health check that can be used to confirm the application is running. It does not require a known user. It does a special get_user() request that may force the user data file to be loaded. Since the user name isn't valid, this will connect as the special "anonymous" user defined in the Users class.

When we're testing, it can help to return all the users. This makes it easier to confirm that the fetch which user file was loaded.

We can start this server with thew flask run command at the command line, this requires, an environment variable, FLASK_APP to name the module with the application.

export PYTHONPATH=src
export FLASK_APP=classifier.py
export FLASK_ENV=development
flask run

Next, we can look at how we use the @authenticate decorator to create a view that only works for requests that include a valid Authorization header.

Authenticated View Function

We'll start with a very simple view function that shows how authentication works.

Here's what we add to our classifier.py application.

@app.route("/whoami")
@authenticate
def who_am_i() -> Response:
    app.logger.info(f"whoami with {request.headers}: User {g.user}")
    return cast(
        Response,
        jsonify(
            {
                "status": "OK",
                "user": g.user.asdict(),
            }
        )
    )

The route is "/whoami". The full URL will be http://127.0.0.1:5000/whoami when the server is running on the local machine.

The @authenticate decorator wraps the view function in the authentication check. This will look at the request.headers to be sure the request comes from a known user.

The view function writes to the log, and returns details of the User object. This is enough to confirm that we can use authenticated view functions.

We can use a request like curl -v to see all of the details of the exchange with the server. This can he helpful debugging information.

We can also use curl -w 'status %{response_code}' http://127.0.0.1:5000/whoami to write the final status code; this can be a helpful summary. (Windows folks must use %% instead of %.)

The curl request with -u 'noriko:Hunter2' should create the following Authorization header:

> Authorization: Basic bm9yaWtvOkh1bnRlcjI=

This will be parsed by our authentication decorator.

This is a base64 encoding of the username and the password. When used over HTTP, it's easily viewed. When used over HTTPS, however, it's as secure as the keys defined in the certificates used to create the connection.

The on-line code repository has details on setting up HTTPS to create a secure web server. It doesn't involve any new Python programming.