Reboot

So, things have changed. It’s been eight months, give or take, since my time at University ended and I was thrown head-first into the real world. I live in Leeds, rather than South Wales, and I work for Sky Betting & Gaming (a subsidiary of BSkyB) as a Software Engineer.

After six months of settling in, it’s time things changed again. I will shortly be moving into Leeds City Centre, where the entire apartment will hopefully become an extended project in itself. Less time commuting means more time developing – and I have some very cool projects in the works…

Slight change of plan…

I sat down and rewrote it, fixing most of the bugs in the process. This is a brief run-down before I start on my report today (16 days left…)

The folder issues have been fixed, it works to a fairly deep level of nested folder now – although I haven’t tested anything extensively.

The filecache may have to be manually cleared in the database on the
first run, after enabling the app in Settings, it should be OK after
that.

Rename and deletion are not currently implemented, but should be quite simple.

My installation required a patch to the filecache class (the modifications can be seen here)
they’re a bit of a ‘kludge’, as it says the comments. This forces a
refresh of the filecache if the folder is versioned and ‘rolled back’.

One last bug: ‘rolled back’ files in the text editor are read-only,
but clicking Save in Gedit via WebDAV doesn’t respect the read-only.
It makes a new commit against HEAD, so no data is lost, but I’m not
sure what’s causing it.

If I’m lucky, this will be in the Gitorious repository before the feature freeze – which makes me ecstatic 🙂 There’s plenty of stuff left to do, but it works, and it’s fairly robust on my installation. Finally, of course, there’s a copy up at GitHub.

Version Control Update

This was originally going to be a mailing list post, but seemed to be a bit long to bother everyone on the list with 🙂


Hi everyone,

Most importantly, I owe you all a great deal of thanks for the discussion and
suggestions you’ve offered over the last few months – my dissertation would not be
what it is without you! Unfortunately I haven’t finished in time, and now need to
stop development to concentrate on the report – although I *really* don’t want to 🙂

In terms of functionality, the Granite PHP library I developed as part of the project
largely matches Glip [1], but with an architecture I can understand
and extend as necessary. I would very much like to continue work on this library
beyond University, primarily for ownCloud’s benefit. It does not, however, support
push and pull operations at the moment, which was one of my ‘nice-to-haves’.

For the ownCloud implementation, everything is self-contained in an “app” under
apps/files_versioning. The two most important pieces are the versionstorage.php
and versionwrapper.php files:

* The versionwrapper.php file implements a PHP streamwrapper, providing access to
versioned:// URLs in the PHP fopen() (and related) functions.
* The versionstorage.php file extends the OC_Filestorage class to provide
access to the streamwrappper from ownCloud’s filesystem.

As a result, if you enable the files_versioning app in the control panel, a Git
repository is initialised and an initial commit with a README file is added to the
repository. File operations are mirrored to the repository – for example, saving a
file (either through the web text editor or via WebDAV) results in a new commit with
the appropriate changes. Deletion is also implemented, while renames and other
functions still need to be written.

On to what doesn’t work: subdirectories. The last implementation pushed to GitHub [2]
supports the above functionality in a top-level directory called ‘Backup’. New files
can be created and modified, however directories cannot. I am currently working on a
branch which fixes these issues (there is no problem with the underlying
functionality, it’s simply been a bit rushed these past few weeks…) which I will
upload as soon as reasonably possible.

Also, there is currently no way to roll-back to a previous version without
cloning the Git repository and performing the relevant operation. There is an
implementation of history browsing in the Settings > Personal section, although this
isn’t currently wired up to anything. Somewhere on my hard drive is a branched copy
of a prototype repository with Glip that implemented this: I shall attempt to dig it
up at some point.

Basically, I want to do lots and lots of things to my implementation, I’m ecstatic
that it works (especially the fact that the Git binary can’t tell the difference
between my PHP-generated repositories and ‘real’ repositories) – but I’ve run out of
time. I have a job to start in July following graduation, so the next couple of
months are going to be hectic. I’d love to keep working on this once I settle down,
but I wanted to make sure I left an up to date copy of the development status, since
I can no longer afford to work on new features.

One last thank-you to simonbuehler who filed the new folder and $_SERVER['DOCUMENT_ROOT'] bugs on GitHub – that simple act alone will give me bonus points for my Final Report 🙂

Please feel free to clone, take over or otherwise modify my work – it’s all yours. Given the issues outlined above (which I could probably fix in around two weeks – but I don’t have two weeks…) I’m sorry to say it’s not going to be finished for the ownCloud 4 release this month. Perhaps ownCloud 5 🙂

[1] http://github.com/patrikf/Glip
[2] http://github.com/craig0990/ownCloud

P.S. This is a digression concerning the licensing: The git client itself is licensed under the GPL, as is Glip. However, the Ruby library which powers GitHub is MIT-licensed. I would particularly like to license Granite under MIT as well, simply because I don’t like the idea of the GPL – I don’t want to get sued however 🙂 and there are a few small parts of Granite which use code from Glip. Any advice?

Granite and ownCloud (at last)

There’s a prototype implementation of Granite, along with an appropriate OC_Filestorage implementation, available on GitHub. Lots of things still don’t work.

You get a ‘Backup’ folder (I will get around to making this configurable) which is present under /data/{USERNAME}/files/Backup, but there won’t be anything in it. All the magic is in the .git directory. This is initialised when you set up ownCloud or after you enable the app in the settings menu. To begin with it contains a README file with some filler content.

Files can be viewed and downloaded through the web interface, although downloading folders as ZIP files is broken. File modification times are currently linked to the repository, not individual files—that also needs to be fixed. Text files can be edited through the web interface, other files can be accessed over WebDAV, where each save results in a new commit—no maximum frequency yet. There may be an issue with new file creation, although I can’t seem to reproduce it again… Pretty sure this is fixed now…

There is a settings pane under the Settings > Personal section, which shows a list of the 50 most recent commits, but it isn’t actually wired up to anything. It’s just a visual confirmation that it’s working at the moment.

But, basically, it works. Please feel free to give it a go and open up any bugs or issues at GitHub. A final note: the GitHub repository is regularly rebased from the Gitorious master branch, so bear that in mind if you clone it.

Granite v0.2.0

Right. Here we go.

Testing

The tests for the previous release were awful. They required a vfsStreamWrapper for mocking a virtual Git repository, which was much more complicated than actually writing Granite in the first place. The new tests have a boostrap.php with a TEST_REPOSITORY_PATH constant and an include statement for constants.php.

The tests are currently using the Git repository as a ‘canonical’ test repository (see http://github.com/gitster/git). There are some limitations to this, namely there are no loose objects, which gives less-than-ideal test coverage. Also (although this extends to most recent Git repositories) there are no version 1 index files for packfiles – it may be simpler to hard-code a test index for version 1 indexes.

The constants.php file defines constants for object SHA-1 ids, including loose, packed, tree, commit, blob and tag objects. These can be edited by hand, or alternatively, delete the file and run the tests again. If there is no constants.php file, the bootstrap will generate a new one using generate_constants.php – this script uses exec() calls to git verify-pack -v .

I wanted to be able to switch the test repository at any point, so while it might seem a bit complicated generating constants.php from another PHP script, it was the simplest thing I could think of.

Granite (GitHub)

Complete API change. Any references to getSomething() methods have been replaced with something() instead. PEAR coding standards have been added to the test runner I use (see the PHP_CodeSniffer PEAR project) to try and make the code look a bit more consistent.

The reason for the change is two-fold: I don’t want to change it again, so this API should stick. Adding write support to Granite would have involved adding a set of setSomething() methods, whereas write support can now be added transparently in a kind of fluent interface. For example, in a Commit object, you can fetch the message using the message() method. Adding support for creating a new Commit object will be as simple as adding an extra parameter: message($message = NULL), along with an extra write() method or something similar.

Lastly, you should be able to fetch any object out of the repository and use the methods available to trace connections all the way through a repository. As an example of very rushed development, there is an extra repository on my GitHub account which uses the Kohana framework (simply as a rapid development solution) and Granite to present a simple, GitHub-like interface using only pure PHP 5. See OpenHub on GitHub for more info.

Finally…

I’ve tagged this as a v0.2.0-alpha.2 release. The releases follow Semantic Versioning, and v0.2.0 should come with more test coverage. As I said a couple of days ago, now I have a rough idea where I’m headed I’ll be implementing a basic ownCloud reader using Granite.

Coming Soon…

Apologies again for the shortage of updates, January consisted mostly of revision and exams, while this week has been short on work due to a personal issue. All things considered, code clean-up and double-checking is all that’s required on my part to push up a new update; this is largely why I’ve been putting it off – I’ve been nearly there for a while now.

An update should come by the end of next week, including a prototype implementation for ownCloud. Again, my apologies – but please stay tuned 🙂

Granite (v0.1.0)

7. Release early. Release often. And listen to your customers.
– Eric S. Raymond, The Cathedral and the Bazaar

First of all, my apologies for the nearly month-long absence. Most of that time has been spent wrapping my head around binary packing, zlib inflation/decompression (in fact the distinction still eludes me…) and playing with Granite, the product of all this tortuous work.

Granite is a pure PHP library for Git. There were several issues with the libraries I managed to find, the most useful of which was Glip.

Glip’s developer has no intention of supporting ‘push/pull/sync’ operations [1], although an alternative implementation exists [2] that permits pushing over “smart” HTTP. However, the Upliner Glip fork [2] hasn’t been updated for a year, and Glip [1] appears to have stalled.

Ideally, Granite will provide a simple, up-to-date implementation of Git reading, writing, with smart HTTP push and pull support. I’d like to start using this library for a variety of uses besides ownCloud, like a locally-installable version of GitHub.

ownCloud

So how does this fit into ownCloud versioning? Granite can read any object from the repository, including tags/branches (refs). Once I’ve developed some classes to represent each of the major objects, I can start writing a PHP StreamWrapper implementation. This should allow me to ‘tweak’ one of the existing OC_Filestorage providers, allowing access to a Git repository as if it were a local directory.

I’m still fuzzy on the details for ownCloud – I don’t much want to go changing the user interface, that’s somebody else’s code. Ideally, you would enable the ‘Git Integration’ (or whatever) application, mount a repository or two (or add existing folders to new repositories) and then be able to browse the current HEAD. For version rollback/recovery, the application should use a configuration value which points to HEAD by default, which the user can then override to allow the viewing of previous files.

Active or Passive Versioning?

That’s a poor choice of heading, but one of my major concerns is how often to make a new ‘version’. I don’t want it to be too granular (i.e. virtually every write results in a new commit) as repository history can get quite large. On the other hand, I don’t want to leave too long between changes, for obvious reasons.

What do you think? (I know people are reading this!) Client-side projects tend to use file change notification systems (SpiderOak, for example [3]) but that’s not really applicable here. Apple’s Time Machine seems to go for an hourly approach [4]. I’ll look more into the choices made by other projects.

Testing

The code is a bit of a shambles at the moment: I’m tired, and it’s just started working, so I want to share it with the world! Beyond all that though, I want to put Granite through its paces before using it to integrate with ownCloud. So please, download it, test it, break it, and tell me all about it. Hopefully the unit tests make sense to people, the tests should run regardless of whether a repository has packed objects or not.

[1] http://lists.fimml.at/glip-devel/0014.html
[2] https://github.com/Upliner/gitphp-glip/wiki/
[3] https://spideroak.com/blog/20091204132500-spideroak-releases-lightweight-filesystem-change-notification-utilities-for-windows-os-x-and-linux-gplv3
[4] https://discussions.apple.com/thread/1949414?start=0&tstart=0

Progress Report (25/11/11)

Finally, the Progress Report which has been distracting me from the rest of my work (including a Ruby on Rails assignment and more Git prototyping) is completed! 4,000+ words detailing the current considerations: future direction, a plan, initial requirements and acceptance tests, and a brief review of version control systems.

The focus for this week is to complete the Rails assignment to a decent standard, and work on completing the first milestone for the 7th December. After that, development should start picking up.

For those interested, it’s a bit of a formal read, but the report is available as a PDF.

DevXS

I spent the weekend of the 11th–13th November at the University of Lincoln for DevXS. The event was sponsored by a variety of University of Lincoln organisations, DevCSI and Amazon Web Services and was apparently a resounding success.

I arrived via train and met up with fellow Aberystwyth students, most of whom are Artificial Intelligence and Robotics students. Being an Open Source Computing student, I was looking for a project with a slightly different focus and found it in Team York from the University of York.

Over 24 hours of constant development we developed the Tasks for Chrome project and submitted it for consideration in the Public Platforms competition, and the general DevCSI competition.

Tasks for Chrome

The final product consisted largely of a PHP API wrapper to the Google Tasks API, which allowed us to do some interesting things with PHP before we talked to Google Tasks. For the the Public Platforms competition, Andrew wrote a JSON parser for reading lists that several Universities provide online. Using this, it was a simple matter to build a web interface that allowed simple CRUD (Create, Read, Update, Delete) functionality while also making the import of reading lists trivial.

A Chrome extension was also developed as a proof of concept: it was possible to rapidly develop it with jQuery and AJAX into a toolbar button with a popup window. The popup window could also perform CRUD operations and import reading lists. There are a number of possibilities from here, such as subscribing to data sources instead of using manual import (think integration with bug trackers). Perhaps you could even sync between two or more users, sharing a todo list between a group.

The Experience

The experience of the entire conference was more important, however. The 26 teams produced some incredible software, with some of my personal favourites being Ook Nog and the Unofficial University Guide. The software developed in such a short space of time, by ~200 self-organising individuals, was inspirational.

The people were incredible; mostly Computer Science students with a smattering of other disciplines, everybody was easy-going, willing to help, and in an amazing mood. A particular mention goes to Dave Challis from University of Southampton for helping me with jQuery AJAX callbacks.

After recovering for a few days, DevXS has reinvigorated my software development passion. Yesterday saw me stepping through the the Git packfile format and the Glip source code, in order to understand exactly what is going on at the bit/byte level (with an explanatory blog post to follow…) which I’ve been putting off for weeks. PHP developers don’t have to deal with binary formats very often, but it really is so much simpler than it looks.

This post is partly to explain the absence of any updates for the last week, and to encourage anybody with free time in February to attend the Dev8D conference, which is bigger, better and free to attend—it really was a life-changing experience.

Common Concepts: Repositories

This post covers the basic storage mechanisms used for the repositories of various systems, along with some advantages/disadvantages of each.

A repository is the database of version information used by a version control system. In centralised systems the repository is on the server, while distributed systems keep a copy on each developer’s machine.

Storage mechanisms for repositories vary, with varying amounts of documentation. Git’s storage and object models are very well described around the web and Mercurial has excellent documentation on the same subject. Bazaar is more difficult to describe, partly because it takes a different approach to the other systems and partly due to a lack of documentation.

There are, broadly speaking, three methods of storing version information:

Snapshots: In snapshot-based systems, each version of a file is saved individually, independent of other versions. This has a few consequences: the repository size quickly grows very large, so compression is typically used to reduce this. This is traded for a single disk access for any version (since they are all stored individually), making access to versions very quick. Git and Bazaar use snapshot-based systems.

Delta Compression: Delta compression stores the difference between two files. Delta storage can be used to efficiently store multiple versions of files, avoiding the large storage issue of snapshot systems (until the repository gets very large anyway). Consequently, the access time for old versions takes longer the more version history there is. Mercurial, Subversion and CVS use delta-based methods for repository storage.

A technique called version jumping [1] can be used to minimise the access costs, and offers some storage improvements. Subversion has a similar implementation called skip deltas.

Weaves: All version information is stored in a single file, in interleaved blocks. Metadata is added to each interleaved block, indicating the version it belongs to along with some other information. Any version of a file can be reconstructed with a single sequential read, although this takes longer the more interleaved blocks there are (i.e. the more history a file has). The trade-off is having to rewrite the file each time a new version is added. Bazaar used to utilise this method (although it is unclear what it uses now) and the original Source Code Control System had a similar mechanism.