Kubeflow is a modern solution to design, build and orchestrate Machine Learning pipelines using the latest and most popular frameworks. Out of the box, Kubeflow ships with MinIO inside to store all of its pipelines, artifacts and logs, however that MinIO is limited to a single PVC and thus cannot benefit from all the features a distributed MinIO brings to the table such as Active-Active Replication, unlimited storage via Tiering – and so much more.
In this blog post we are going to configure Kubeflow to use a large MinIO Tenant on the same Kubernetes cluster, but of course, this configuration applies to Kubeflow and MinIO being on different clusters as well. For your reference, please see our earlier blog post, Machine Learning Pipelines with Kubeflow and MinIO on Azure, and the Kubeflow site.
While we go from soup to nuts in this blog post, if you already have a Kubeflow setup and a MiniO setup, you can skip straight to the Configure Kubeflow section of this blog post to see what needs to be configured.
Setting up the MinIO Operator
Let’s start by installing the MinIO Operator and creating a tenant that Kubeflow will use. My favorite way to install MinIO Operator is via kubectl apply -k
but we also have Helm Charts available, and we are also available on the AWS Marketplace, Google Cloud Marketplace and Azure Marketplace.
kubectl apply -k github.com/minio/operator/ |
This will install the latest and greatest MinIO Operator, now we just need to log into the Operator UI and create a tenant. For this step we’ll get a service account JWT token to login, but this UI can also be secured with AD/LDAP or OIDC.
kubectl -n minio-operator get secret $(kubectl -n minio-operator get serviceaccount console-sa -o jsonpath=”{.secrets[0].name}”) -o jsonpath=”{.data.token}” | base64 –decode && echo “” |
Now let’s port forward the UI and login.
kubectl -n minio-operator port-forward svc/console 9090 |
Now open a browser, go to http://localhost:9090 and login with the JWT token we got on the previous step.
After logging in, click on Create Tenant and set up a 1TiB tenant.
Enter the name of the new tenant and the namespace for it.
If the namespace doesn’t exist you have the option to create the namespace.
Now let’s size the tenant. I’ll be setting up a 4 node cluster that has 4 drives on each node, in this case, because we’re on Kubernetes, node or server translates to pods and drives per server translates to PVCs per pod.
I’m also starting with 1TiB of capacity but you can always expand the capacity of the tenant.
Let’s go to Identity Provider and create a basic user that will be used by Kubeflow. If you choose to configure an external identity provider that uses OpenID or Active Directory/LDAP, you can just go ahead and create a service account after you log in to the tenant.
Lastly, we’ll disable TLS just to keep this blog post from getting too long, but if you want to have TLS enabled on your tenant, you’ll need a certificate configured on the tenant that Kubeflow trusts.
And that’s it, just hit Create and the tenant will be created in a few minutes.
That’s it, now you have a distributed, high performance, hyper scale object storage that can be expanded endlessly. From here, let’s configure Kubeflow to use this MinIO deployment.
Setting up Kubeflow
In this section, we’ll set up Kubeflow from scratch on Kubernetes. This works for on-premise deployments, development environments or any public cloud, although cloud providers frequently offer a pre-configured version of Kubeflow.
We’ll be using the kubeflow/manifest repository. Bear in mind there are some strict requirements for this to work, for example, the highest version of Kubernetes supported by Kubeflow 1.5.0 (at the time of writing) is 1.21 so make sure you’re using a Kubernetes cluster that meets this requirement.
One additional requirement is to have Kustomize version 3.2.0, and that’s it.
Let’s start by cloning the kubeflow/manifest repository
git clone https://github.com/kubeflow/manifests |
Then change directories the manifest folder and run the following command:
cd manifests |
This command will take a few minutes to install all the resources needed by Kubeflow. If anything fails to apply, the command will continue attempting to apply it until it succeeds entirely.
After a few minutes, you can confirm all the pods in the kubeflow namespace are up and running:
kubectl -n kubeflow get pods |
Now we will configure Kubeflow to use our new MinIO.
Configure Kubeflow
The following section is the core of connecting Kubeflow and MinIO. Please note that the resources that need to be modified in this section are also what you’d tweak if you were starting with an existing Kubeflow deployment.
We are going to edit a variety of Config Maps, Secrets and Deployments on the kubeflow namespace first, and then on any existing user namespaces.
All of these steps assume MinIO is running in the ns-1 namespace and running on port 80. If you were running the tenant with TLS you’d use port 443.
Tenant URL: minio.ns-1.svc.cluster.local
Edit Configmaps
Edit the pipeline-install-config config map and add the following fields to .data:
minioServiceHost: minio.ns-1.svc.cluster.local |
Edit command:
kubectl -n kubeflow edit cm pipeline-install-config |
Edit the configmap workflow-controller-configmap and configure the endpoint field inside the s3 section to point to your tenant
s3: |
Use this command to edit the configmap:
kubectl -n kubeflow edit cm workflow-controller-configmap |
Edit the ml-pipeline-ui-configmap configmap and replace the json content of viewer-pod-template.json with the following json:
{ |
Use this command to edit the configmap:
kubectl -n kubeflow edit cm ml-pipeline-ui-configmap |
Make sure the indentation structure of the json matches the existing format.
Edit Secrets
We will update the secret that holds the credentials to MinIO, however these are meant to be base64 encoded, so you can encode them with shell:
echo -n “kubeflow” | base64 |
Edit the secret mlpipeline-minio-artifact and set these values in the .data field
data: |
Use this command to edit the configmap:
kubectl -n kubeflow edit secret mlpipeline-minio-artifact |
Edit Deployments
We will now edit the deployments last to cause a pod restart and to get everything ready.
Edit the ml-pipeline-ui deployment and add the following environment variables: