[dist-bugs] Hello!
Jesse Vincent
jesse at bestpractical.com
Wed Jul 9 15:26:01 EDT 2008
On Jul 9, 2008, at 2:20 PM, Joey Hess wrote:
> Jesse Vincent wrote:
>> As we've been building Prophet, our peer to peer database and SD (a
>> peer
>> to peer bugtracker on top of Prophet), we've been guided by the
>> following possibly-contentious belief:
>>
>> Distributed source control is all about being able to have many,
>> many,
>> possibly divergent branches of a project and being able to easily
>> share
>> changes. Distributed databases, on the other hand, should be able to
>> incorporate all changes from any peer and percolate toward eventual
>> consistency.
>
> That makes good sense for low-level databases in general, I think. You
> need primitives that can sync together. What about bug tracking
> specifically though? I see that Prophet uses some form of voting to
> resolve conflicts.
The conflict resolver is pluggable, but I think we've got something
fairly neat in the automatic conflict resolver. In Prophet, conflict
resolutions are treated as first-class entities, keyed by fingerprints
of the conflict they resolve. This lets you build out relatively
sophisticated voting strategies and trust relationships for resolutions.
> Does that mean that SD can only track one state for a
> bug, and if there's a conflict over the state, someone wins? Or can SD
> track multiple states?
With Prophet, it's relatively easy to add more tables and create
records in those tables with foreign keys back to some other table.
I don't think that modeling bug (or whole ticket database) states on
different branches of a project as branches themselves is a wise thing
to do, mostly because the question I most often want to ask about a
bug is "is it fixed? by whom? in what commit?" I really want all the
branch metadata attached to the bug - as opposed to the bug metadata
attached to the branch.
In the SD world, the plan for handling fixes on different branches is
to use a secondary table with one row per SCM repository/branch/
status, but that's not something we've done yet.
> I think that managing multiple states for a single bug are important,
> since bugs can apply to many different, divergent, or even
> unrelated[1]
> code bases. You definitly want to make it easy to agree on a state
> for a
> bug, when there really is a common state, but it seems to me that
> tracking
> disagreements and divergent states is just as important.
Yep.
>
> BTW, in one of your slides you mention targeting scaling to the
> order of
> 50 thousand bugs. That seems a bit low -- Debian has an order of
> magnitude
> more. What's the main scalability issue, is it data syncing, or the
> merging algorythm? Perhaps keeping half a million bugs siloed away in
> a single BTS is something we want to get away from, so I'm just
> curious. :-)
Mostly, that number was drawn out of a hat. There's nothing
fundamentally 'wrong' with Prophet which will keep it from scaling far
larger, but I mostly wanted to impress upon people that distributed
tools need to scale differently than monolithic, centralized tools and
that Prophet isn't trying to take on BigTable, MySQL, Postgres and
Oracle. (It also doesn't help that we're not done with our query
indexing support, so searching 500,000 bugs might take a few minutes.)
I wouldn't expect that there are many folks who would want to mirror
all of the Debian BTS onto their laptops. It's much more important,
I think, to make it easy to mirror the bits you care about and resync
whenever and wherever you choose.
Jesse
More information about the dist-bugs
mailing list