So, I'm considering a backup system that has as its goal to let you backup all your data from third party websites, easily.

The idea I have for its UI is you enter in the urls for all the sites you use, or feed it a bookmark file. By examining the URLs, it determines if it knows how to export data from each site, and prompts for any necessary login info. The result would be a list of sites, and their backup status. I think it's important that you be able to enter websites that the system doesn't know how to handle yet; backup of such sites will show as failed, which is accurate. It seems to almost be appropriate to have a web interface, although command-line setup and cron should also be supported.

For the actual data collection, there would be a plugin system. Some plugins could handle fairly generic things, like any site with an RSS feed. Other plugins would have site-specific logic, or would run third-party programs that already exist for specific sites. There are some subtle issues: Just because a plugin manages to suck an RSS feed or FOAF file off a site does not mean all the relevant info for that site is being backed up. More than one plugin might be able to back up parts of a site using different methods. Some methods will be much too expensive to run often.

As far as storing the backup, dumping it to a directory as files in whatever formats are provided might seem cheap, but then you can check it into git, or let your regular backup system (which you have, right?) handle it.


Thoughts, comments, prior art, cute program idea names (my idea is "?unsilo"), or am I wasting my time thinking about this?