Sunday, February 6, 2022

Malthe Borch: PowerShell Remoting on Windows using Airflow

Apache Airflow is an open-source platform that allows you to programmatically author, schedule and monitor workflows. It comes with out-of-the-box integration to lots of systems, but the adage that the devil's in the details holds true with integration in general and remote execution is no exception – in particular PowerShell Remoting which comes with Windows as part of WinRM (Windows Remote Management).

In this post, I'll share some insights from a recent project on how to use Airflow to orchestrate the execution of Windows jobs without giving up on security.

Traditionally, job scheduling was done using agent software. An agent running locally as a system service would wake up and execute jobs at the scheduled time, reporting results back to a central system.

The configuration of the job schedule is either done by logging into the system itself or using a control channel. For example, the agent might connect to a central system to pull down work orders.

Meanwhile, Airflow has no such agents! Conveniently, WinRM works in push mode. It's a service running on Windows that you connect to using HTTP (or HTTPS). It's basically like connecting to a database and running a stored procedure.

From a security perspective, push mode is fundamentally different because traffic is initiated externally. While we might want to implement a thin agent to overcome this difference, such code is a liability on its own. Luckily, PowerShell Remoting comes with a framework that allows us to substantially limit the attack surface.

The aptly named Just-Enough-Administration (JEA) framework is basically sudo on steroids. It allows us to use PowerShell as an API, constraining the remote management interface to a configurable set of commands and executing as a specific user.

We can avoid running arbitrary code entirely by encapsulating the implementation details in predefined commands. In addition, we also separate the remote user that connects to the WinRM service from the user context that executes commands.

You can use PowerShell Remoting without JEA and/or constrained endpoints. But the intersection of Airflow and Windows is typically a bigger company or organization where security concerns mean that you want both of these.

As an aside, I mentioned stored procedures earlier on. Using JEA to change context to a different user is equivalent of Definer's Rights vs Invoker's Rights. Arguably, in a system-to-system integration, using Definer's Rights is helpful in reducing the attack surface because you can define and encapsulate the required functionality.

The steps required to register a JEA configuration are relatively straight-forward. I won't describe them in detail here but the following bullets should give an overview:

In summary, registering a JEA configuration can be as simple as defining a single role capabilities file and running a command to register the configuration.

Now, enter Airflow!

To get started, you'll need to add the PowerShell Remoting Protocol Provider to your Airflow installation.

Add a connection by providing the hostname of your Windows machine, username and password. If you're using HTTP (rather than HTTPS) then you should set up the connection to require Kerberos authentication such that credentials are not sent in clear text (in addition, WinRM will encrypt the protocol traffic using the Kerberos session key).

To require Kerberos authentication, provide {"auth": "kerberos"} in the connection extras. Most of the extra configuration options from the underlying Python library pypsrp are available as connection extras. For example, a JEA configuration (if using) can be specified using the "configuration_name" key.

You will need to install additional Python packages to use Kerberos. Here's a requirements file with the necessary dependencies:

apache-airflow-providers-microsoft-psrp
gssapi
krb5
pypsrp[kerberos]

Finally, a note on transport security. When WinRM is used with an HTTP listener, Kerberos authentication (acting as trusted 3rd party) supplants the use of SSL/TLS through the transparent encryption scheme employed by the protocol. You can configure WinRM to support only Kerberos (by default, "Negotiate" is also enabled) to ensure that all connections are secured in this way. Note that your IT department might still insist on using HTTPS.

Historically, Windows machines feel worse over time for no particular reason. It's common to restart them once in a while. We can use Airflow to do that!

from airflow.providers.microsoft.psrp.operators.psrp import PSRPOperator

default_args = {
    "psrp_conn_id": <connection id>
}

with DAG(..., default_args=default_args) as dag:
    # "task_id" defaults to the value of "cmdlet" so can omit it here.
    restart_computer = PSRPOperator(cmdlet="Restart-Computer", parameters={"Force": None})

This will restart the computer forcefully (which is not a good idea, but it illustrates the use of parameters). In the example, "Force" is a switch so we pass a value of None – but values can be numbers, strings, lists and even dictionaries.

In the first example, we saw how task_id defaults to the value of cmdlet – that is sometimes useful, but it's not the only way we can cut verbosity.

PowerShell cmdlets (and functions which for our purposes are the same thing) follow the naming convention verb-noun. When we define our own commands, we can for example use the verb "Invoke", e.g. "Invoke-Job1". But invoking stuff is something we do all the time in Airflow and we don't want our task ids to have this meaningless prefix all over the place.

Here's an example of fixing that, making good use of Airflow's templating syntax:

from airflow.providers.microsoft.psrp.operators.psrp import PSRPOperator

default_args = {
    "psrp_conn_id": <connection id>,
    "cmdlet": "Invoke-",
}

with DAG(..., default_args=default_args) as dag:
    # "cmdlet" here will be provided automatically as "Invoke-Job1".
    job1 = PSRPOperator(task_id="Job1")

Windows can have its verb-noun naming convention and we get to have short task ids.

By default, Airflow serializes operator output using XComs – a simple means of passing state between tasks.

Since XComs must be JSON-serializable, the PSRPOperator automatically converts PowerShell output values to JSON using ConvertTo-Json and then deserializes in Python before Airflow will then reserialize it when saving the XComs result to the database – there's room for optimization there! The point is that most of the time, you don't have to worry about it.

You can for example list a directory using Get-ChildItem and the resulting table will be returned as a list of dicts. Note that PowerShell has some flattening magic which generally does the right thing in terms of return values:

That is, functions don't really return a single value. Instead, there is a stream of output values stemming from each command being executed.

With do_xcom_push set to false, no XComs are saved and the conversion to JSON also does not happen.

PowerShell has a number of other streams besides the output stream. These are logged to Airflow's task log by default. Unlike the default logging setup, the debug is also included unless explicitly turned off logging_level – one justification for this is given in the next section.

In traditional automation, command echoing has been a simple way to figure out what a script is doing. PowerShell is a different beast altogether, but it is possible to expose the commands being executed using Set-PSDebug.

from pypsrp.powershell import Command, CommandParameter

PS_DEBUG = Command(
    cmd="Set-PSDebug",
    args=(CommandParameter(name="Trace", value=1), ),
    is_script=False,
)

default_args = {
    "psrp_conn_id": <connection id>,
    "psrp_session_init": PS_DEBUG,
}

This requires that Set-PSDebug is listed under "VisibleCmdlets" in the role capabilities (like ConvertTo-Json if using XComs).

A tracing line will be sent for each line passed over during execution at logging level debug, but as mentioned above, this will nonetheless get included in the task log by default. Don't enable this and have a loop that iterates hundreds of times. You will quickly fill up the task log with useless messages.

Happy remoting!



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...