Versioning Azure ML Pipeline Endpoints

The other day I was looking into some Azure ML features for a customer where I would need to use the Azure ML Pipeline Endpoints. While playing around with it, I noticed a couple of things.

But first, what are Azure ML Pipeline Endpoints?

A pipeline is the different steps that need to be executed to train a machine learning model. For example, you want an action to prepare your data, a second step to train your model, and a third to register your model.

This pipeline needs to be started each time you want to train the model.

Now imagine you need to retrain your model on a schedule. Then you can publish that pipeline to an endpoint. Meaning you can start the machine learning process via a REST API. So you can kick it off from any application or scheduling tool.


So what were my issues?

You can not delete a pipeline endpoint.

Yes, you are reading that correctly. Once you publish a pipeline, there is no way of removing them. The only thing you can do is disable them.

So the only way you can clean them up is by disabling them. There have been requests to add this functionality. But I hope I am not the only one that would even think about redeploying an environment to get it clean.

But you can version them! Can you?

So with that in mind, what happens when I publish it again? Since the SDK allows you to add a version number to it, you would expect it would override the existing one. No, that is not what happened. It creates a new pipeline endpoint with the same name but a different Endpoint ID.

In the Microsoft documentation, it tells you to make use of the pipeline.publish() functionality. And it does what it says, but it creates a new endpoint each time you execute it.

So how can you fix this? After reading a bit more about the endpoints, there are two different endpoints.

The first one is the pipeline.publish(), and the second one is the PipelineEndpoint. The first one is a single endpoint and can not be updated.

The second one is a group of endpoints that are versioned. Even in the workspace, you can see the difference.

So instead of doing this when you publish.

pipeline = Pipeline(ws, steps=[prep_step, train_step])

published_pipeline = pipeline.publish(
     name="PipelineWithParameters",
     description="A published pipeline with parameters",
     version="1.0")

Do this:

pipelineName = "NewPipelineOverriden"
pipelineDescription = "A published pipeline with parameters"

pipeline = Pipeline(ws, steps=[prep_step, train_step])

try:
    pipeline_endpoint = PipelineEndpoint.get(ws, name=pipelineName)
    #Pipeline exists
    published = pipeline.publish(
                name=pipelineName
    )
    pipeline_endpoint = PipelineEndpoint.get(
        workspace=ws, name=pipelineName
    )
    pipeline_endpoint.add_default(published)
except:
    #Pipeline does not exists
    published = pipeline.publish(
                name=pipelineName
            )
    pipeline_endpoint = PipelineEndpoint.publish(
                workspace=ws,
                name=pipelineName,
                pipeline=published,
                description=pipelineDescription,
        )

Hope this helps you out in your next Azure ML project!