This section will fill in some details of the web service that implements the classification. Recall from Chapter One, the objective here is to build a simple classifier before building a more realistic and sophisticated classifier. In the long run, a clever production recommendation service can be a service offering or -- better yet -- a platform on which companies build service offerings.
We need to control access to the features of this service. If we're going to make money, we'll need to be sure only properly authenticated users are performing properly authorized actions.
User Authentication identifies who the user is. Until a user is authenticated, nothing else can be done.
User Authorization defines what the user is permitted to do. In our case study, there are two roles, keeping things relatively simple.
The role of "User" in the Context Diagram from Chapter One is -- at this point -- less than ideal. It was tolerable as an initial description of the interfaces to the application. At this point, it has become more clear that a term like "Researcher" might be a batter dscription for someone researching a sample and looking for a classification.
Here's an expanded context diagram with this new consideration of authenticating users.
We've added an additional use case, Authenticates, for both classes of users. Once Authenticated, then only specifically authorized actions will be permitted.
When working with the Flask framework, the development view can help to visualize the components that we'll need to build.
This diagram decomposes the application into three parts:
The Data Model. This is the data from the problem domain:
TrainingData
, Sample
, and Hyperparameter
. The web site
depends on this.
The View Functions. These are functions, bound into a Flask application. They respond to web requests. This depends on data model to provide the functions visible through the API.
The Tests. We'll set much of this aside until Chapter Thirteen.
In this section, we'll focus on the view functions.
In order to implement these functions, we'll need to introduce some additional classes that aren't
a proper part of the problem domain. The idea of a User, Botanist, Researcher doesn't have much
direct connection with the TrainingData
, Sample
, Hyperparameter
, or Distance
classes.
A great deal of this processing is readily available in the flask
package.
We'll examine a logical of the necessary Flask components.
The following diagram shows how the Flask object has several view functions.
We've shown the functions using a conventional UML class rectangle with only
two sections, but a stereotype of «function»
to clarify this isn't
a class.
Each view function shares a common authenticate
decorator. This, too,
is shown in a class-like rectangle, even though it's a Python function.
The decorator relies on the class Users
which is a collection of User
instances.
The Authenticate
class has a UML stereotype of «decorator»
to remind
us this is a function used to add a feature to view functions.
We've introduced a Role
as an enumeration of distinct values
that summarize actions available to a user.
Each of the view functions will check to see which Role
value the user
has. Users without the required Role
value will receive
an error.
We'll handle user authentication with password validation, and avoid
the more secure (and more complex) schemes like OAuth.
When working with passwords, it's important to only store hashes of passwords.
The werkzeug.security
package provides a secure password validation mechanism.
There are important security rules related to passwords:
Never store passwords.
Never store reversibly encrypted passwords. If the key is compromized, then all passwords are lost.
Only store hashes of passwords.
These rules can never be violated safely. There is no reversible encryption that is safe from a disgruntled employee. We've emphasized this strongly because it is a surprisingly common vulnerability.
One more important point on the User
class.
We need to separate our notion of "user" from their various identities around the internet.
Only allowing for a single email address is a potential problem.
Next, we'll look at some code for the User
class. Each person will
be represented by an instance of this class.
We'll start by defining the roles a user can play. These correspond to the actor classes shown on the context diagram in Chapter One.
Here's the enumeration of actor roles in the Role
class:
from enum import Enum
class Role(str, Enum):
UNDEFINED = ""
BOTANIST = "botanist"
RESEARCHER = "researcher"
This lets us assign a string that maps to
one of these enumerated values. The Role
class object implements a handy lookup to convert
strings to enumerated Role
values.
>>> from classifier import Role
>>> Role("botanist")
<Role.BOTANIST: 'botanist'>
>>> Role("invalid")
Traceback (most recent call last):
...
ValueError: 'invalid' is not a valid Role
This provides a handy, bounded list of
valid values, collected into a namespace
named Role
.
The User
class defines an individual User
of our classifier
application.
class User:
headers = ["username", "email", "real_name", "role", "password"]
def __init__(
self,
username: str,
email: str,
real_name: str,
role: Role,
password: Optional[str] = None,
) -> None:
self.username = username
self.email = email
self.real_name = real_name
self.role = role
self.password = password
def set_password(self, plain_text: str) -> None:
self.password = werkzeug.security.generate_password_hash(plain_text)
def valid_password(self, plain_text: str) -> bool:
return werkzeug.security.check_password_hash(
self.password or "md5$$", plain_text
)
def __repr__(self) -> str:
return (
f"{self.__class__.__name__}("
f"username={self.username!r}, "
f"email={self.email!r}, "
f"real_name={self.real_name!r}, "
f"role={self.role.value!r}, "
f"password={self.password!r})"
)
The headers
attribute for the class is a handy list of the column
titles required in a CSV file of rows that can create
User
instances. We'll return to this, below.
The set_password()
and valid_password()
methods work
with password hashes. See the important security notes above:
it's unsafe to store any reversible password encryption.
We use the werkzeug.security
module
to create the secure hashes we save for each user.
We also use the werkzeug.security
module to compare
an existing hash with a candidate password to see if the hashes match.
In addition to working with users, it can help to see how we acquire the initial list of users to work with. Larger applications may use databases to help coordinate work among multiple servers. For smaller applications, a shared file will work out very nicely.
Reading and writing the Users is something we need to defer to Chapter Nine, where we talk in detail about serialization techniques.
Here are the other methods of the User
class to
convert from CSV dictionaries to objects and from objects
back to a dictionary suitable for CSV writing.
@staticmethod
def from_dict(csv_row: Dict[str, str]) -> "User":
return User(
username=csv_row["username"],
email=csv_row["email"],
real_name=csv_row["real_name"],
role=Role(csv_row["role"]),
password=csv_row["password"],
)
def __eq__(self, other: Any) -> bool:
other = cast(User, other)
return all(
[
self.username == other.username,
self.email == other.email,
self.real_name == other.real_name,
self.role == other.role,
]
)
def asdict(self):
return {
"username": self.username,
"email": self.email,
"real_name": self.real_name,
"role": self.role.value,
"password": self.password,
}
For now, we'll provide a common template for creating objects from CSV source data.
The User
class has a from_dict()
method to
create new instances from the dictionaries read by a CSV reader.
A method with the @staticmethod
decorator is part of the class -- as a whole -- and is not associated with a
specific instance. It has no self
variable, and is usually
evaluated via User.from_dict()
; the class name acts as
namespace instead of an object.
The asdict()
method is the opposite of the from_dict()
method.
This emits a dictionary object, usable for creating CSV files.
The name, asdict()
parallels name name of a function in the dataclasses
module.
We acknowledge that some people don't like these inconsistent naming styles.
For testing purposes, we added an __eq__()
method.
This lets us trivially compare two User
objects.
The Users
collection will use these methods to build the User
instances. We'll look at the Users
collection class next.
Individual User
objects are collected into a Flask extension
class, Users
. This class has an impotant Flask-specific
feature: the init_app()
method.
This method binds the extension to the current Flask server;
it provides configuration information that the extension can use.
In this case, the Flask application's configuration
can provide the USER_FILE
parameter to name
the file of users to load.
Our Users
class is an example of Design by Composition.
We've wrapped a dictionary into a class that offers some extra
methods.
The internal dictionary is a mapping from user name to User
instance.
The extra methods load User
instances from a CSV file.
class Users:
def __init__(self, init: Optional[Dict[str, User]] = None) -> None:
self.users = init or {}
self.anonymous = User("", "", "", Role.UNDEFINED)
self.app: Optional[Flask] = None
def init_app(self, app: Flask) -> None:
self.app = app
self.app.config.setdefault("USER_FILE", Path("users.csv"))
def get_user(self, name: str, default: Optional[User] = None) -> User:
if not self.app:
raise RuntimeError("Users not bound to an app")
if not self.users:
self.load()
return self.users.get(name, default or self.anonymous)
def load(self) -> None:
"""Load file when needed."""
if not self.app:
raise RuntimeError("Users not bound to an app")
with self.app.config["USER_FILE"].open() as user_file:
row_iter = csv.DictReader(user_file)
user_iter = (User.from_dict(row) for row in row_iter if row)
self.users = {user.username: user for user in user_iter}
def save(self) -> None:
if not self.app:
raise RuntimeError("Users not bound to an app")
with self.app.config["USER_FILE"].open("w", newline="") as user_file:
writer = csv.DictWriter(user_file, User.headers)
writer.writeheader()
writer.writerows(u.asdict() for u in self.users.values())
The get_user()
method uses a technique called "lazy loading".
This method get the data from a file after a request comes in that needs
validation. We can expand this technique to check file timestamps
and reload the file only after it changes. A number of web servers
can then share a single file, reloading it after the file changes.
Now that we've defined a single User
and the collection in the Users
class.
we can look at the details of authentication for a given request.
The first part of the authentication design is the exception we'll raise if there's a problem.
class NotAuthorized(Exception):
status_code = 401
def __init__(
self,
message: str,
status_code: Optional[int] = None,
payload: Optional[Dict[str, str]] = None,
) -> None:
super().__init__(message)
self.message = message
if status_code is not None:
self.status_code = status_code
self.payload = payload
def to_dict(self) -> Dict[str, Any]:
rv: Dict[str, Any] = dict(self.payload or ())
rv["message"] = self.message
return rv
This is a common design for customized exception classes in a Flask context. We've provided a defaut status code. The object initialization requires a message. It can make use of a status code override, and a payload of additional details.
The super().__init__(message)
expression passes the message string to the superclass,
Exception
, to initialize the arguments there. This makes sure
that the object has all of the behaviors of the built-in Exception
class.
The second part of the authentication design
is the @authenticate
decorator function. A decorator function
transforms a function into a new function. In this case, we want
to expand the application's view functions with a standardized authentication check.
def authenticate(view_function: Callable[..., Response]) -> Callable[..., Response]:
@wraps(view_function)
def decorated_function(*args: str) -> Response:
auth_body = request.headers.get("Authorization", "").split(" ")
auth_type, credentials = auth_body if len(auth_body) == 2 else ("", ":")
username, _, password = (
base64.b64decode(credentials).decode("utf-8").partition(":")
)
g.user = users.get_user(username)
conditions = [
auth_type.upper() == "BASIC",
g.user.valid_password(password),
]
if not all(conditions):
raise NotAuthorized("Unknown User")
return view_function(*args)
return decorated_function
The purpose of a decorator is to create a new function, decorated_function()
that uses
the existing function, view_function()
. The new function can pre-process
parameters or post-process return values. In this case, the decorated function will
examine the request headers to see if the Authorization
header
reflects a known user and their credentials.
As with the werkzeug.security.check_password_hash()
function,
we want to do the processing in as consistent a way as possible.
This idea is to make sure the application behaves similarly
for an unkown user, or a known user with a faulty password.
The type hints for the view function are describe view functions
using Callable[..., Response]
. This syntax describes any function with
arbitrary arguments that creates a Flask Response
object.
Flask version 1.1.2, used for this book, does not have type hints.
The make_response()
function of Flask can deal with
at least seven kinds of return values from a view function.
To be a little more precise, we could use the following definition.
ReturnValue = Union[
str, bytes, Dict[str, Any],
Tuple[...], Response, Callable[..., List[str]]
]
Our application will not make use of all of these very general
features. Indeed, we're going to rely on Flask's jsonify()
function
to create Response
documents. This lets us claim that all
view functions in our application are Callable[..., Respnse]
.
Now that we have a authentication decorator, we can look at the Flask application, and use the decorator to create view functions.
The Flask
object is the application. We use this application
object to define a number of web application features.
We'll start with a customized error handler.
app = Flask(__name__)
app.config.from_object(Demo) # os.environ["CLASSIFIER_CONFIG"]
users = Users()
users.init_app(app)
@app.errorhandler(NotAuthorized)
def handle_unauthorized(error: NotAuthorized) -> Response:
response = cast(Response, jsonify(error.to_dict()))
response.status_code = error.status_code
return response
The app
object is the Flask application. We immediately provide
a configuration. We'll look at the configuration objects below.
They're classes that provide the kinds of definitions that might
change between development and production.
The Users
collection will contain all of the user account
information. We use the Users.init_app()
to bind the collection
to the current application. This allows the class to have access
to the application's configuration.
The @app.errorhandler
decorator binds an error-handling
function to a specific subclass of exceptions. In this case,
any NotAuthorized
exceptions will be specially handled by this
function. This function transforms the exception in a conventional
Response
object.
There are a number of other exceptions we can consider providing.
A customized invalid request exception that is raised due to bad or unparseable data.
A customized rejected requests exception due to duplicate training set names or other kinds of internal data conflicts.
As we implement these other exceptions, we should consider creating a hierarchy of special exception classes. We can then have a single handler for all of our application-specific exceptions.
Here's our first view function:
@app.route("/health")
def user_list() -> Response:
# Be sure the users database gets loaded.
users.get_user("")
response = {"status": "OK", "user_count": len(users)}
if app.config["TESTING"]:
response["users"] = [u.asdict() for u in users.values()]
return cast(Response, jsonify(response))
This is a health check that can be used to confirm the application
is running. It does not require a known user. It does a special
get_user()
request that may force the user data file to be loaded.
Since the user name isn't valid, this will connect as the
special "anonymous" user defined in the Users
class.
When we're testing, it can help to return all the users. This makes it easier to confirm that the fetch which user file was loaded.
We can start this server with thew flask run
command at
the command line, this requires, an environment variable,
FLASK_APP
to name the module with the application.
export PYTHONPATH=src
export FLASK_APP=classifier.py
export FLASK_ENV=development
flask run
Next, we can look at how we use the @authenticate
decorator
to create a view that only works for requests that include
a valid Authorization
header.
We'll start with a very simple view function that shows how authentication works.
Here's what we add to our classifier.py
application.
@app.route("/whoami")
@authenticate
def who_am_i() -> Response:
app.logger.info(f"whoami with {request.headers}: User {g.user}")
return cast(
Response,
jsonify(
{
"status": "OK",
"user": g.user.asdict(),
}
)
)
The route is "/whoami"
. The full URL will be http://127.0.0.1:5000/whoami
when the server is running on the local machine.
The @authenticate
decorator wraps the view function
in the authentication check. This will look at the request.headers
to be sure the request comes from a known user.
The view function writes to the log,
and returns details of the User
object.
This is enough to confirm that we can use authenticated view functions.
We can use a request like curl -v
to see all of the details
of the exchange with the server. This can he helpful
debugging information.
We can also use curl -w 'status %{response_code}' http://127.0.0.1:5000/whoami
to write the final status code; this can be a helpful summary.
(Windows folks must use %%
instead of %
.)
The curl request with -u 'noriko:Hunter2'
should create the following
Authorization
header:
> Authorization: Basic bm9yaWtvOkh1bnRlcjI=
This will be parsed by our authentication
decorator.
This is a base64 encoding of the username and the password. When used over HTTP, it's easily viewed. When used over HTTPS, however, it's as secure as the keys defined in the certificates used to create the connection.
The on-line code repository has details on setting up HTTPS to create a secure web server. It doesn't involve any new Python programming.