[dist-bugs] Hello!

Jesse Vincent jesse at bestpractical.com
Wed Jul 9 15:26:01 EDT 2008


On Jul 9, 2008, at 2:20 PM, Joey Hess wrote:

> Jesse Vincent wrote:
>> As we've been building Prophet, our peer to peer database and SD (a  
>> peer
>> to peer bugtracker on top of Prophet), we've been guided by the
>> following possibly-contentious belief:
>>
>> Distributed source control is all about being able to have many,  
>> many,
>> possibly divergent branches of a project and being able to easily  
>> share
>> changes.  Distributed databases, on the other hand, should be able to
>> incorporate all changes from any peer and percolate toward eventual
>> consistency.
>
> That makes good sense for low-level databases in general, I think. You
> need primitives that can sync together. What about bug tracking
> specifically though? I see that Prophet uses some form of voting to
> resolve conflicts.

The conflict resolver is pluggable, but I think we've got something  
fairly neat in the automatic conflict resolver.  In Prophet, conflict  
resolutions are treated as first-class entities, keyed by fingerprints  
of the conflict they resolve. This lets you build out relatively  
sophisticated voting strategies and trust relationships for resolutions.

> Does that mean that SD can only track one state for a
> bug, and if there's a conflict over the state, someone wins? Or can SD
> track multiple states?

With Prophet, it's relatively easy to add more tables and create  
records in those tables with foreign keys back to some other table.

I don't think that modeling bug (or whole ticket database) states on  
different branches of a project as branches themselves is a wise thing  
to do, mostly because the question I most often want to ask about a  
bug is "is it fixed? by whom? in what commit?"  I really want all the  
branch metadata attached to the bug - as opposed to the bug metadata  
attached to the branch.

In the SD world, the plan for handling fixes on different branches is  
to use a secondary table with one row per SCM repository/branch/ 
status, but that's not something we've done yet.

> I think that managing multiple states for a single bug are important,
> since bugs can apply to many different, divergent, or even  
> unrelated[1]
> code bases. You definitly want to make it easy to agree on a state  
> for a
> bug, when there really is a common state, but it seems to me that  
> tracking
> disagreements and divergent states is just as important.

Yep.

>
> BTW, in one of your slides you mention targeting scaling to the  
> order of
> 50 thousand bugs. That seems a bit low -- Debian has an order of  
> magnitude
> more. What's the main scalability issue, is it data syncing, or the
> merging algorythm? Perhaps keeping half a million bugs siloed away in
> a single BTS is something we want to get away from, so I'm just  
> curious. :-)


Mostly, that number was drawn out of a hat. There's nothing  
fundamentally 'wrong' with Prophet which will keep it from scaling far  
larger, but I mostly wanted to impress upon people that distributed  
tools need to scale differently than monolithic, centralized tools and  
that Prophet isn't trying to take on BigTable, MySQL, Postgres and  
Oracle.  (It also doesn't help that we're not done with our query  
indexing support, so searching 500,000 bugs might take a few minutes.)

I wouldn't expect that there are many folks who would want to mirror  
all of the Debian BTS onto their laptops.   It's much more important,  
I think, to make it easy to mirror the bits you care about and resync  
whenever and wherever you choose.


Jesse


More information about the dist-bugs mailing list