Running ownCloud in Kubernetes With Rook Ceph Storage
This is the first half of a series about running ownCloud in Kubernetes, with focus on high availability, scalability, and performance – what are the optimal database and storage choices?
The first part will be about the basics and requirements of the setup, so the second part can cover the details step by step.In summary, we want to reach the following:
- Outages of the hardware should neither lead to data loss nor availability problems.
- Rising user numbers should not cause problems, or at least be easier to handle:
- Depending on the type of storage the servers use, Ceph is very performant and shouldn’t have problems with many users.
- Depending on which ownCloud features are used, the other possible bottleneck, the database.
What is Kubernetes?
Kubernetes is an orchestrator for containers. This means that you can run Kubernetes containers across many different servers, and ships other useful features. Apart from running containers, Kubernetes can do a lot more, e.g. make HTTP applications reachable from the Internet through the Kubernetes Ingress Feature.
The rest of this series requires some basic Kubernetes knowledge. If Kubernetes is still a Pandora’s Box for you, you can build up that knowledge through the amazing tutorials by the Kubernetes Project.
Note: the second article of the series assumes that you have a working Kubernetes cluster with at least one Node / Worker.
What Do We Need?
Let’s start with the components which you always need. We can then work ourselves up to the corresponding tools and projects which can help us to do meet those needs in the Kubernetes environment.
Database – PostgreSQL
Let’s start with the most important component, the database.
In the case of a small ownCloud instance, often SQLite is used as a database. SQLite is not made for high availability. You should think about a change to either PostgreSQL, MySQL, or Oracle in any case.
Support for the Oracle Database Server is available in the ownCloud Enterprise Edition. For more information about SQLite, see When to and not to use SQLITE – FAQ – ownCloud Central.
One of the supported databases is PostgreSQL. As there are relatively small operators in Kubernetes which can run a PostgreSQL cluster, we will use those.
But before we show the “PostgreSQL Operator”, what even is such an operator?
Simply put, an operator in Kubernetes is an automation mechanism. The operator can react to the database and create certain custom objects in Kubernetes, so called CustomResourceDefinitions.
This means as an eample for a PostgreSQL operator: when a PostgreSQL object is created, the operator reacts to it and automatically creates other Kubernetes objects (e.g. Services, Deployments, StatefulSets) to create a PostgreSQL cluster.
In this article we will use Zalando’s Postgres Operator to run a PostgreSQL cluster in Kubernetes.
ownCloud needs storage to save uploaded files. The database needs storage, too, e.g. for user logins, app data, and shares.
For ownCloud a file system storage like NFS makes the most sense. The reason for using file system storage instead of block storage is that block storage was never intended for more than for a Writer.
You could also integrate Object Storage like AWS S3 into ownCloud, but in this series we will limit ourselves to the usage of file system storage.
For PostgreSQL you should definitely use block storage though, if you want the best performance. The background is that the database can write more directly with block storage; the Linux kernel can assist with caching.
Now as we answered the question which type of storage is best for which part of the setup, let’s talk about the storage software.
The Ceph Project has been around circa since 2006. The highest priority for Ceph is data security. Perfect for us, as we do not want to lose any of our valuable data, whether vacation photos, musix, or important documents.
You don’t have to worry about Ceph being continuously developed – the Ceph foundation supports Ceph centrally, to push the already strong development even more. This shows again how good it is if companies which use Open Source come together and pull in the same direction.
Ceph is very complex, but offers many features. Apart from filesystem storage, you can also use it for block storage and object storage in different protocols (e.g. S3 or OpenStack SWIFT).
A fundamental recommendation is to read the Intro to Ceph – Ceph Documentation to understand the basic concepts. CERN, Deutsche Telekom, and many other organizations and companies use Ceph as a storage system for their applications.
Most likely now the questions appears – where is Ceph supposed to run? The question is good and easily answered – in Kubernetes of course. Rook.io is the way to go here.
Rook enables Ceph to run in Kubernetes, just as other software which keeps persistent files, e.g. EdgeFS, Minio, CockroachDB and others.
Above at Database – PostgreSQL we talked about Kubernetes operators. Rook is such an operator, which reacts to Kubernetes custom objects. If it reacts on
CephCluster objects, it can e.g. create a Ceph cluster in Kubernetes.
Apart from creating the Ceph cluster, at the moment Rook also takes care of creating and deleting volumes in Ceph, while managing the
PersistentVolume object in Kubernetes.
For everyone who is interested in containers and Kubernetes I recommend to read about the topics Kubernetes Blog – Container Storage Interface (CSI) and Kubernetes – Persistent Volumes.
Now, as we have dealt with the storage topic, there is only one component missing: Redis.
By default, the database takes care of file locking. In the end we want to take this extra effort away from the database effort. That’s why we are going to use Redis for it. For this topic there is Transactional File Locking.
Again we are going to use an operator to make our life a bit easier. The kubedb Operator can run Redis as a Cluster in Kubernetes.
For more information about the Redis part in kubed Operator, take a look at the kubedb Documentation.
For a final overview how this will look in Kubernetes, here is a diagram with the components:
To summarize it in bullet points:
- Kubernetes to run ownCloud and the other components as containers.
- Ingress controller which depends on the Kubernetes installation, to make ownCloud accessible from the Internet.
- Zalando’s Postgres operator for PostgreSQL clusters in Kubernetes.
- kubed operator for Redis clusters in Kubernetes.
- Ceph Storage via a Rook.io container in Kubernetes.
We will execute this plan step by step in the second part of this article series, to run ownCloud in Kubernetes, redundant and highly available.
Did you like this article or do you have further suggestions? Leave a comment below or share this post on social media!