Valiant and the Open Policy Agent

Valiant is an auditing tool for Python projects. It aims to provide an easy method for gathering information about project dependencies so as to help developers determine potential risks that dependencies may present. Example risks include licensing issues, known vulnerabilities and project sustainability.

The Open Policy Agent (OPA) provides a toolset for defining policies and comparing input data against the policies.

In this demo I'll demonstrate how to define OPA policies and use them to review data from Valiant. This helps determine if a project meets governance requirements.

I'm new to OPA so please let me know if you feel the policies could be improved.

Get started with Valiant

Valiant is a Python package available on PyPi.

Installing Valiant requires Python 3.8 and the standard pip command:

pip install -U valiant

Once installed, check the details with:

valiant about

Version 0.2.1 or above will be fine for this article.

Say you wanted to check the details for the Flask package:

valiant report flask 1.1.1

That displays a human-readable output but for OPA we'll need JSON:

valiant report flask 1.1.1 -o json

It's more likely that you'll want to check all of a project's dependencies against your policies. I'll use Poetry to initialise a project and add some dependencies:

pip install poetry
poetry new demo_project
cd demo_project
poetry add flask==1.1.1
poetry add insecure-package==0.1.0

You'll be able to see the project configuration in the pyproject.toml file. To see the non-development dependencies, try poetry show --no-dev. For a nice tree view, try poetry show --no-dev --tree and you'll get something similar to the output below:

flask 1.1.1 A simple framework for building complex web applications.
├── click >=5.1
├── itsdangerous >=0.24
├── jinja2 >=2.10.1
│   └── markupsafe >=0.23
└── werkzeug >=0.15
insecure-package 0.1.0 Insecure Package, don't use it

The insecure-package is a demo package that the safety scanner uses as a demonstrator package - it is safe to use.

Now that we have a (small) project set up, we can run Valiant reporting for the dependencies:

# Export the dependencies to a requirements.txt file:
poetry export --format requirements.txt --output requirements.txt --without-hashes

# Display a list of findings
poetry run valiant audit requirements.txt -s

The output will be a table of findings similar to the one below:

+--------------+-----------+----------+----------+--------------+--------------+
|   Package    |    ID     |  Level   | Category |    Title     |   Message    |
| Coordinates  |           |          |          |              |              |
+==============+===========+==========+==========+==============+==============+
| https://pypi | SPDX001   | info     | license  | SPDX License | BSD-3-Clause |
| .org/pypi :: |           |          |          | found        |              |
| click ::     |           |          |          |              |              |
| 7.1.2        |           |          |          |              |              |
+--------------+-----------+----------+----------+--------------+--------------+
| https://pypi | SPDX001   | info     | license  | SPDX License | BSD-3-Clause |
| .org/pypi :: |           |          |          | found        |              |
| Flask ::     |           |          |          |              |              |
| 1.1.1        |           |          |          |              |              |
+--------------+-----------+----------+----------+--------------+--------------+
| https://pypi | BASIC003  | warning  | project  | No link to   | The project  |
| .org/pypi :: |           |          |          | codebase     | doesn't      |
| insecure-    |           |          |          |              | provide a    |
| package ::   |           |          |          |              | link to its  |
| 0.1.0        |           |          |          |              | codebase.    |
+--------------+-----------+----------+----------+--------------+--------------+

Whilst a table is handy for us to read, we'll need JSON to feed into OPA:

# Audit the requirements with JSON output:
valiant audit requirements.txt --out json>valiant_audit.json

Hint: use the jq tool to view the audit data: cat valiant_audit.json |jq

You can find a copy of the audit output in the project repository under docs/demo/opa/valiant_audit.json.

Now that we have the basics of Valiant sorted out, let's take OPA for a quick spin.

Get started with OPA

Through this article I'll use a local copy of OPA.

You could also use an OPA docker image:

docker pull docker.io/openpolicyagent/opa

The code for used in this article is located in the docs/demo/opa directory of the Valiant GitHub project. I'll provide most of the code in this article so you don't need to get a copy of the repository (but you're always welcome to do so).

The Rego playground is a useful tool for trying out policy files. You can copy the .rego code from this article and try it out in the playground.

The OPA extension for Visual Studio Code is worth trying out if you're using VSCode.

A basic policy test

OPA policies are defined in a rego file. The code below defines a policy that requires that an MIT license be used:

basic.rego:

package basic

default allow = false

allow {
    input.license == "MIT"
}

Policy testing provides a mechanism for checking that the policy is capturing conditions correctly. The code below provides a basic test suite for our basic policy:

basic_test.rego:

package basic

test_app_allowed {
    allow with input as {"name": "valiant", "license": "MIT"}
}

test_app_not_allowed {
    not allow with input as {"name": "my_app", "license": "BSD-3-Clause"}
}

test_app_not_allowed_missing_license {
    not allow with input as {"name": "my_app"}
}

Let's run the tests:

./opa test basic -v

The output should appear as follows:

data.basic.test_app_allowed: PASS (127.108075ms)
data.basic.test_app_not_allowed: PASS (429.647µs)
data.basic.test_app_not_allowed_missing_license: PASS (421.856µs)
--------------------------------------------------------------------------------
PASS: 3/3

Consider an input that provides the correct license:

{
    "name": "test",
    "license": "MIT"
}

We can check this against the policy using:

./opa eval \
    --data basic/basic.rego \
    --input input/input_1.json  \
    --format pretty \
    'data.basic'

The result indicates that the input meets the policy requirement:

{
  "allow": true
}

A different input uses an unacceptable license:

{
    "name": "test2",
    "license": "Commercial"
}

Running an evaluation against the input:

./opa eval \
    --data basic/basic.rego \
    --input input/input_2.json  \
    --format pretty \
    'data.basic'

... and we see that false is returned and the policy is not met:

{
  "allow": false
}

Test a single package

Let's move on and start using real data. This time we want to check if a candidate dependency will meet policy requirements.

First of all, produce Valiant reports for a number of packages:

valiant report flask 1.1.1 --out json > report/flask.json
valiant report django 1.2 --out json > report/django.json
valiant report valiant 0.2.1 --out json > report/valiant.json

Note: I've used a very old version of Django so as to illustrate security findings.

Valiant uses reporting plugins to produce a set of findings for each package. You can quickly view these with valiant report django 1.2 -s or, with the json we produced just before, you can create a summary with:

cat report/valiant.json \
    | jq '.reports[].findings[] | {id:.id, package: .coordinates.name, version: .coordinates.version, title: .title, message: .message, data: .data}'

So now we have some input data to test, let's create some policies.

The policies/dependency.rego file contains all of the policies discussed in this section.

First of all, I want to make sure I'm only using packages with a correctly declared licence. The Valiant SPDC reporting plugin returns the SPDX002 finding when the package's licence could not be determined. The check below will note a violation on an SPDX002 finding:

violations[pkg] = message {
    # All packages must have a properly declared license
    findings := input.reports.spdx.findings[_]
    findings.id == "SPDX002"
    pkg := concat("::", [findings.coordinates.name, findings.coordinates.version])
    message := "SPDX002: No licence could be determined."
}

Next, I want to make sure that, where a license could be determined (SPDX001), it is an OSI approved license. The following policy will check that a license was found and it is an OSI approved license:

violations[pkg] = message {
    # All packages must have an OSI-approved license
    findings := input.reports.spdx.findings[_]
    findings.id == "SPDX001"; findings.data.is_osi_approved != true
    pkg := concat("::", [findings.coordinates.name, findings.coordinates.version])
    message := "An OSI-approved licence is required."
}

Valiant's Safety report provider uses the Safety package to determine any known vulnerabilities for a package. This check just looks for any SAFETY001 findings:

violations[pkg] = message  {
    # Any findings from the safety report needs to be raised
    findings := input.reports.safety.findings[_]
    findings.id == "SAFETY001"
    pkg := concat("::", [findings.coordinates.name, findings.coordinates.version])
    message := concat(": ", [findings.id, findings.message])
}

Finally I want to check for packages that the Basic report provider determines as not production ready. The BASIC005 finding flags packages with a development status between 1 and 4 (refer to the classifiers) and BASIC006 flags packages marked as inactive.

violations[pkg] = message {
    # Packages not marked as production-ready
    not_mature := {"BASIC005", "BASIC006"}
    findings := input.reports.basic.findings[_]
    not_mature[findings.id]
    pkg := concat("::", [findings.coordinates.name, findings.coordinates.version])
    message := concat(": ", [findings.id, findings.message])
}

The allow rule will return true if no violations were found:

allow = true {
    count(violations) == 0
}

Policy check

We can check the Flask report from Valiant against the policy:

./opa eval --data policies \
           --input report/flask.json \
           --format pretty \
           'data.valiant.demo.dependency.allow'

The output is true, indicating that the allow policy was met. That's handy for a visual check but OPA can give us something a bit more tangible:

./opa eval --data policies \
           --input report/flask.json \
           --format pretty \
           --fail-defined 'data.valiant.demo.dependency.violations[pkg]'

The resulting output of undefined indicates that there were no violations. By using the --fail-defined parameter, the exit code is set based on if any result was found. Calling echo $? displays 0, indicating that the flask.json data does not raise any policy violations.

Trying the evaluation against the valiant.json report will yield a policy violation:

./opa eval --data policies \
           --input report/valiant.json \
           --format pretty \
           --fail-defined 'data.valiant.demo.dependency.violations[pkg]'
echo $?

The exit value ($?) of 1 indicates that there were violations and the table displayed by OPA describes the issue:

+------------------+----------------------------------------------+
|       pkg        | data.valiant.demo.dependency.violations[pkg] |
+------------------+----------------------------------------------+
| "valiant::0.2.1" | "BASIC005: The package is                    |
|                  | marked as '1 - Planning'"                    |
+------------------+----------------------------------------------+

The report from the very old Django version will yield even more to be concerned about:

./opa eval --data policies \
           --input report/django.json \
           --format pretty \
           --fail-defined 'data.valiant.demo.dependency.violations[pkg]'

There are a lot of violations coming from that report! Here's an excerpt:

+---------------+----------------------------------------------+
|      pkg      | data.valiant.demo.dependency.violations[pkg] |
+---------------+----------------------------------------------+
| "Django::1.2" | "SPDX002: No licence could be                |
|               | determined."                                 |
| "Django::1.2" | "SAFETY001: Django before                    |
|               | 1.11.27, 2.x before 2.2.9,                   |
|               | and 3.x before 3.0.1 allows                  |
|               | account takeover. A suitably                 |
|               | crafted email address (that                  |
|               | is equal to an existing user's               |
|               | email address after case                     |
|               | transformation of Unicode                    |
|               | characters) would allow an                   |
|               | attacker to be sent a password               |
|               | reset token for the matched                  |
|               | user account. (One mitigation                |
|               | in the new releases is to send               |
|               | password reset tokens only                   |
|               | to the registered user email                 |
|               | address.) See CVE-2019-19844."               |
| "Django::1.2" | "SAFETY001: Cross-site                       |
|               | scripting (XSS) vulnerability                |
|               | in Django 1.2.x before 1.2.2                 |
|               | allows remote attackers to                   |
|               | inject arbitrary web script or               |
|               | HTML via a csrfmiddlewaretoken               |
|               | (aka csrf_token) cookie."                    |

Auditing a project

It's more likely that we'll want to check the full set of project dependencies against our policies. This would be handy as part of a CI/CD pipeline as we could block non-compliant projects from being deployed.

Recall that earlier I used poetry to generate the dependency list and passed this to valiant produce an audit report:

# Export the dependencies to a requirements.txt file:
poetry export --format requirements.txt --output requirements.txt --without-hashes

# Run an audit and output JSON:
valiant audit requirements.txt --out json>valiant_audit.json

The resulting JSON is an array of reports (1 per package) and each report will have 0 or more findings garnered from the various reporting plugins we've used. This is a different structure to the single-package report covered in the last section so the policies will need some changes.

The audit-policies/main.rego policy file defines a suite of checks to determine aspects of the Valiant audit report that breach policy. It's quite similar to the policy described earlier - just with some slightly different checks:

package valiant.demo

default allow = false

violations[pkg] = message {
    # Packages not marked as production-ready
    not_mature := {"BASIC005", "BASIC006"}
    findings := input[_].reports.basic.findings[_]
    not_mature[findings.id]
    pkg := concat("::", [findings.coordinates.name, findings.coordinates.version])
    message := concat(": ", [findings.id, findings.message])
}

violations[pkg] = message {
    # Only allow specifically approved licences
    permitted_licenses := {"BSD-3-Clause", "MIT"}
    findings := input[_].reports.spdx.findings[_]
    findings.id == "SPDX001"
    not permitted_licenses[findings.message]
    pkg := concat("::", [findings.coordinates.name, findings.coordinates.version])
    message := concat(": ", [findings.id, findings.message])
}

violations[pkg] = message {
    # All packages must have a properly declared license
    findings := input[_].reports.spdx.findings[_]
    findings.id == "SPDX002"
    pkg := concat("::", [findings.coordinates.name, findings.coordinates.version])
    message := concat(": ", [findings.id, findings.message])
}

violations[pkg] = message  {
    # Any findings from the safety report needs to be raised
    findings := input[_].reports.safety.findings[_]
    findings.id == "SAFETY001"
    pkg := concat("::", [findings.coordinates.name, findings.coordinates.version])
    message := concat(": ", [findings.id, findings.message])
}

allow = true {
    count(violations) == 0
}

Note: a small test suite is provided and can be run with ./opa test policies -v.

Policy check

Now that we have the policy defined and the input data ready, we can perform an evaluation:

./opa eval --data policies/ \
           --input valiant_audit.json \
           --format pretty \
           --fail-defined 'data.valiant.demo.audit.violations[pkg]'
+---------------------------+-----------------------------------+
|            pkg            | data.valiant.demo.violations[pkg] |
+---------------------------+-----------------------------------+
| "insecure-package::0.1.0" | "BASIC005: The package is         |
|                           | marked as '2 - Pre-Alpha'"        |
| "insecure-package::0.1.0" | "SPDX002: Could not map           |
|                           | licence MIT license to an SPDX    |
|                           | license"                          |
| "itsdangerous::1.1.0"     | "SPDX002: Could not map           |
|                           | licence BSD to an SPDX            |
|                           | license"                          |
| "insecure-package::0.1.0" | "SAFETY001: This is an            |
|                           | insecure package with lots        |
|                           | of exploitable security           |
|                           | vulnerabilities."                 |
+---------------------------+-----------------------------------+

As an aside, you may note that insecure-package is reported as not having an SPDX licence. Whilst the Pypi metadata for the package does provide "license": "MIT license", the reporting just looks for MIT and it's the additional license that causes a failed match. Perhaps as Valiant matures this matching will improve.

Again, by using the --fail-defined 'data.valiant.demo.violations[pkg]' parameter we can then check the exit code (echo $?) and see the command returned 1. The policy check could be wrapped in a script or some other construct in a continuous integration process that causes the build to fail. Using this approach provides an easy policy layer to the CI/CD process and can prevent deployments that don't meet policy guidelines.

Conclusion

This article has outlined a method for defining policies for a project and using Valiant to supply data for policy validation. This light-weight process can be used by a developer as part of their daily routine and pre-commit checks.

In a broader usage model, OPA could be run as a central policy server and used by CI/CD pipelines to ensure that only compliant codebases are able to be deployed.

The Valiant project is very young but it supports plugins for gathering information about Python dependencies. You can readily add reports that run under Valiant and then check that (meta)data using OPA.

Ultimately, the goal is to recognise/reduce risk in your software project's supply chain. This contributes to teams ensuring sustainability in their projects.

Further reading: