How Sciebo Works
DRAFT!
This document is still under development… subject to change without notice…
How does Sciebo actually work? Where is the data located?
On this page, we’ll explain how Sciebo works to give you some background knowledge. Understanding how the system works will help you avoid potential errors, as user errors can lead to data loss.
Where is the data located?
Physically speaking, the data is mirrored at two locations at the University of Münster. This storage system
is a dedicated server infrastructure with hundreds of individual hard drives on which the data is stored. A
special algorithm distributes the data blocks across different hard drives, so that the failure of individual
drives can be compensated for by the system, preventing data loss. In case of failure, defective drives are
replaced promptly. Another advantage of this data distribution is the very high data throughput, even though
these are still conventional magnetic hard drives. When a file is written, the write operation is distributed
and parallelized across multiple hard drives. Instead of the usual approximately 120 MB/s, we can store data
at a total of several gigabytes per second. With very low latency, this data is then also saved to the storage
system at the second location.
Because this storage system, like the Sciebo servers themselves, is located on university premises, we can
prevent foreign governments and their organizations from unauthorized access to our users’ data. This data
sovereignty is of paramount importance to us!
From a logical perspective, the data initially resides in the individual user accounts. This means that,
according to the current concept, each user has their own data directory. The structure is hierarchical. The
owner of the data is the user in whose respective folder that data is stored. Specifically, this means that
if this user shares a folder with you and you save data there, that data will also end up in the folder that
was shared with you. Sciebo (and therefore also ownCloud/Nextcloud) does not store who stored this data there.
Consequently, this also means that any data recovery requires the (written) consent of the person who shared
that folder with you. Provided this sharing is still active, data recovery to the exact same location is quite
straightforward. However, if the sharing has expired because the person sharing it no longer uses Sciebo, we
cannot avoid certain formalities.
Otherwise, we open the door to potential misuse. On the one hand, we are not allowed to do this, but on the
other hand, the trust of our users in these principles is very important to us.
If you now wish to access the data, there are several ways. You can access the data easily and directly via
the web interface. The various client apps are designed to simplify access to this data. Unlike the mobile
clients, the desktop clients are also able to save a local copy of your data.
This is particularly useful if internet access fails for any reason. However, the VFS option
must be deactivated for this to work. If VFS is active, the client behaves similarly to the mobile version.
Data is only temporarily downloaded when the respective file is accessed. Local, permanent storage of the file
must be explicitly selected by the user.
Comparison with Network Drives
Therefore, Sciebo is not comparable to network drives. With a network drive, there is only one central data
set on the server, which all users access. Therefore, only one person can write to a file at a time. A file
lock is passed through to all other users, so they can only read it.
In contrast, when accessing a network drive with client software, the data is copied to the client temporarily
or permanently (depending on the settings). Thus, each user has their own copy of the file, and everyone can
open this file locally (including write access), since file locks cannot currently be passed through. This can
be particularly problematic when editing Office documents. The last person to write the file wins. If multiple
write operations occur virtually simultaneously, file conflicts arise that must be resolved manually.
And now what?
The file locks created by Office products are essentially nothing more than lock files, which could
theoretically be synchronized down to the local system. We experimented with this a few years ago, but
unfortunately, it proved to be very unreliable. The problem is latency. That is, when the first user opens the
file, the lock file is created. This would then have to be synchronized up to the server and then distributed
to all other clients. This process is unreliable. Some users may have a slow client system or a slow internet
connection. Therefore, the distribution of a lock file can take a few seconds, but it can also take considerably
longer. A feature that is so unreliable isn’t a feature at all; it gives users a false sense of security.
Therefore, the only viable alternative at present is to move collaborative document editing to the web.
We host a suitable office suite for this purpose. The data is accessed centrally there, thus avoiding conflicts
as much as possible.
Why not Samba?
In theory, the data could also be made available as a Samba share. Unfortunately, this doesn’t work smoothly with Sciebo and the other related cloud systems. The main reason for this is the metadata database that manages the data in Sciebo. When data is edited via a client or the web interface, these changes (file path, owner, timestamp, checksum, etc.) are stored in a database. A Samba server cannot communicate with this database. This means that Sciebo wouldn’t be notified of changes, and the checksums and other entries for the respective file would be corrupted. Background jobs that could track such changes and update the database in a timely manner require an extremely high number of resources. The system would be severely slowed down and would also become very unreliable.
But there’s WebDAV, right?!
Yes, that’s true. It’s a protocol that Sciebo supports in principle. However, it’s not trivial to use. The
probability of data loss is relatively high if you don’t know exactly what you’re doing. We therefore advise our
users against using this interface, and we don’t offer official support for it.
Those who know exactly what they’re doing can access Sciebo via WebDAV using tools like rclone. But we don’t
like this, as it can tie up a lot of system resources. That’s simply unfair to all other users. If, for example,
we see that such access is putting too much strain on the servers, we are forced to deactivate the account of
the user in question.