Ideas for a new Backup program
Backup the backupers
The backup application should be backupable to!
Otherwise you can't restore your backups 10 Years later, because all libs or script interpreters are not available anymore.
As a consequence, the application can't be written in or rely on Python/Bash/… scripts.
annotate-only: The software will not be included in the backups. Only the hash of the used application will be annotated to the snapshot. You can later try to find or compile the correct version of the backup program by your own.
include-encrypted: The used application will be stored encrypted on the backup location. This won't leak any information (eg. use of an insecure version of the backup software). How you decrypt the application, without the application is up to you. Alternatively you could backup the software with gpg on your own.
include (default): Save an unencrypted, but signed, copy of the used backup application. You should “never” use this plain copy without checking the signature. Therefore it's saved non-executable. Use ….
to get a verified, executable copy.
include-executable: Same as “include”, but the file is saved as an executable. A signature will still be created.
Not written in a scripting language
Don't rely on scripting languages
Static linked
No dynamically linked version
Reproducible builds
Annotate Hash of used backup application
Use backuped application instead of locally installed and updated application to continue an incomplete backup.
Users?/Repo? should be able to force?/configure? the packup of the backup software.
Save compressed version backup programm?
Multiple Repo Versions
A Repo Version could be in one of the following states:
Active: New blobs will be saved in this Version. The maintenance routine will also convert old blobs to this (really? How to the track it?)
Stand-By: This Version won't be touched. It will not get any new data, but the old still remains.
Delete: Blobs of this Version will be deleted by the maintenance routine.
A Repo Version can have a different crypto setup (master key, algos, ..) or a different repo format.
Use cases:
Change master key (periodically, after you removed a user, …)
Update Repo/Backup software
Migrate compromised crypto
Keep backup crypto.
User keys
Simple Personal backups: Password + Only Remote
Server backup to a foreign location: Keyfile + Only local
Server backup to a controlled location: local Keyfile + remote ???
A user (each user?) must be able to change the repo master key.
This new key has to be distributed to every other user, without knowing (or having access) to their credentials → Asymetric crypto:
Keep history of configuration? Signed by user keys?
Does it make sense to have different repo versions (with different keys) but all are accesable via one single config, encrypted by only one key?
Forensics
An admin should be able to do some simple sanity checks without having to decrypt everything. This has several use cases:
Filesystem is still OK: Check “integrity” (compare filename with it's hash → filename should be the complete hash (easier for scripts)).
Filenames got scrambled: Reconstruct repo structure by computing hashes.
Filesystem got destroyed: Find Blobbs in a Bitstream
Blobs should be named/identified by their cryptographic hash (fast hash)
The blobs should be named after their hash
Blobs should have a unencrypted
Misc
The existance of a file (eg. a blob) doesn't mean aything. Don't rely on it. The file has to be in a signed store! But you should also not write to a file, without checking it's existence before (hash collisions?).
Save everything read only.
Prevent backup loops (detect our own repo, but securely (signing))
/sha512/… (used hash in filename)
Endianess (Should be runable on ARM/x86/…)
Alywas define int size/string max length (x86/amd64)
Ideal file size (so blobs won't be fragmented → recover after crash?)
max file name length (strange file systems)
degraded tree (would names, starting the same, would degrade the directory entry? use only last bytes as reperating directory)
Explicitely state what's incomplete (e.g. in cancled snapshots)
One should be able to create it's own user account (so you don't have to type your password on the machine of an other user). This account has to be added to a repo by an other user.
ACL (at least user/admin?)
Sign everything (MAC).
Prevent downgrade attacks (placing of old configs with weak crypto or evil users)