Regarding my final year thesis

classic Classic list List threaded Threaded
74 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Regarding my final year thesis

Harsh Bhatt
I an Applied Maths student, currently in my final year. In my last 6 months i need to do a thesis something related to Mathematics as i am a Maths student. I have been using gentoo for quite a long time so was thinking to do something related to gentoo. Give me suggestion of what can be done. Anything related to  modeling, simulation or Discreet Mathematics would be a better choice.

Reply | Threaded
Open this post in threaded view
|

Re: Regarding my final year thesis

Jauhien Piatlicki-2
Hi,

Mathematics you said? That's nice. You can, for example, redesign our
portage's dependency solving algorithm, as it is quite slow at the
moment. ) I do not know what it does have inside right now, but using
SAT solver can be a good idea (there is a successful example already:
https://en.opensuse.org/openSUSE:Libzypp_satsolver)

--
Regards,
Jauhien

On 11/06/2014 01:49 PM, Harsh Bhatt wrote:
> I an Applied Maths student, currently in my final year. In my last 6 months i need to do a thesis something related to Mathematics as i am a Maths student. I have been using gentoo for quite a long time so was thinking to do something related to gentoo. Give me suggestion of what can be done. Anything related to  modeling, simulation or Discreet Mathematics would be a better choice.
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Regarding my final year thesis

Ciaran McCreesh-4
On Thu, 06 Nov 2014 14:25:46 +0100
Jauhien Piatlicki <[hidden email]> wrote:
> Mathematics you said? That's nice. You can, for example, redesign our
> portage's dependency solving algorithm, as it is quite slow at the
> moment. ) I do not know what it does have inside right now, but using
> SAT solver can be a good idea (there is a successful example already:
> https://en.opensuse.org/openSUSE:Libzypp_satsolver)

A SAT encoding for dependency resolution is a *terrible* idea, for all
kinds of reasons (some of which are Gentoo-specific, and some of which
are not).

* The model would be full of implication constraints, which perform
terribly under unit propagation.

* You can't get decent human-readable explanations of failure out of SAT
solvers.

* You're not just trying to find a correct resolution. You're trying to
find an optimal resolution, with respect to some very difficult
criteria. For example, you don't want to install any extra unrelated
packages. This is very hard to express in SAT if you're going for a
model which preserves consistency.

* Coming up with a legal ordering in SAT is a pain. It's worse if you're
trying to fully solve circular dependencies: if you do, dependency
resolution becomes harder than NP, so you'd at least need a QSAT
solver, not SAT.

* Coming up with a legal resolution isn't always the right thing to do.
Often a legal resolution can be obtained by uninstalling a whole load
of stuff or switching loads of USE flags off. But it's better to give a
good error to the user than to come up with a legal but stupid
resolution. In other words, you *don't* want a complete algorithm, you
want a domain-aware incomplete algorithm.

If you're going to go the toolkit route, you should be using a CP
solver, not a SAT solver. But even then you'd be better off making some
changes and not using plain old MAC, so you're back to writing the
algorithms yourself.

What you need is for someone who understands CP and SAT to write a
resolver using algorithms inspired by how CP and SAT solvers work, but
not just blindly copying them. Doing this well is at least a full year
Masters level project...

--
Ciaran McCreesh

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Regarding my final year thesis

Ian Stakenvicius-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 06/11/14 08:43 AM, Ciaran McCreesh wrote:

> On Thu, 06 Nov 2014 14:25:46 +0100 Jauhien Piatlicki
> <[hidden email]> wrote:
>> Mathematics you said? That's nice. You can, for example, redesign
>> our portage's dependency solving algorithm, as it is quite slow
>> at the moment. ) I do not know what it does have inside right
>> now, but using SAT solver can be a good idea (there is a
>> successful example already:
>> https://en.opensuse.org/openSUSE:Libzypp_satsolver)
>
> A SAT encoding for dependency resolution is a *terrible* idea, for
> all kinds of reasons (some of which are Gentoo-specific, and some
> of which are not).
>
> [ Snip! ]
>
> What you need is for someone who understands CP and SAT to write a
> resolver using algorithms inspired by how CP and SAT solvers work,
> but not just blindly copying them. Doing this well is at least a
> full year Masters level project...
>


...well, if this is an undergrad project, he could start with the SAT
solver and then do what you recommend for his Masters' .. :)


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iF4EAREIAAYFAlRbkToACgkQ2ugaI38ACPBwYwEAtrXJFaVlf4WSv7eV8N+vX6T9
VFq56sh59LmeJ6+UMJcA/33trhsYdNAoRe6i/RWIIRQw8zyS37lIo6I9bLA7TEPg
=7kZS
-----END PGP SIGNATURE-----

Reply | Threaded
Open this post in threaded view
|

Re: Regarding my final year thesis

Ciaran McCreesh-4
On Thu, 06 Nov 2014 10:18:18 -0500
Ian Stakenvicius <[hidden email]> wrote:
> ...well, if this is an undergrad project, he could start with the SAT
> solver and then do what you recommend for his Masters' .. :)

Naah, SAT is doomed. A (bad) vanilla CP model is doable, but in my
experience of students doing these kinds of projects, SAT and IP look
sufficiently "mathsy" to count as a maths project, but if you hand in a
CP model to a mathematician they'll go "I don't understand, you just
wrote down some stuff describing something. This isn't maths!"...

--
Ciaran McCreesh

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Regarding my final year thesis

Jeroen Roovers-3
In reply to this post by Jauhien Piatlicki-2
On Thu, 06 Nov 2014 14:25:46 +0100
Jauhien Piatlicki <[hidden email]> wrote:

> Mathematics you said? That's nice. You can, for example, redesign our
> portage's dependency solving algorithm

More generally perhaps: do something interesting with the portage tree.
If not as directly useful as fixing dependency, a look at how bits of
the tree changed over time (particularly with regard to
inter-dependencies between the bits) could be much more interesting
than regarding any particular snapshot.

The other huge multidimensional tree we have is the bug tracker
database. Several social science majors have already tried to do
something intelligible with the bug tracker data (and failed in my
opinion) so I am confident that someone who doesn't have that socially
oriented view of networks might be able to come up with more outrageous
and interesting viewpoints on how the bug tracker actually works and how
various bits of it interconnect, or doesn't work and don't connect,
respectively.


     jer

Reply | Threaded
Open this post in threaded view
|

Re: Regarding my final year thesis

Harsh Bhatt
Thank you Jauhien Piatlicki, Ciaran McCreesh, Ian Stakenvicius, Jeroen Roovers for your detailed replies.

After reading all the proivded information, I got confused about doing SAT or CP model. Currently i am in 5 th year of Applied Mathematics and i have 6 months of time to complete my work.

> "The other huge multidimensional tree we have is the bug tracker
> database. Several social science majors have already tried to do
> something intelligible with the bug tracker data (and failed in my
> opinion) so I am confident that someone who doesn't have that socially
> oriented view of networks might be able to come up with more outrageous
> and interesting viewpoints on how the bug tracker actually works and how
> various bits of it interconnect, or doesn't work and don't connect,
> respectively."  -- Jeroen Roovers

This idea seems bit interesting, about how the bug tracker works. In this i just need to confirm that how much mathematical aspect can be included. It's a good idea to work on.

Harsh Bhatt



On Friday, 7 November 2014 2:58 AM, Jeroen Roovers <[hidden email]> wrote:


On Thu, 06 Nov 2014 14:25:46 +0100

Jauhien Piatlicki <[hidden email]> wrote:

> Mathematics you said? That's nice. You can, for example, redesign our
> portage's dependency solving algorithm


More generally perhaps: do something interesting with the portage tree.
If not as directly useful as fixing dependency, a look at how bits of
the tree changed over time (particularly with regard to
inter-dependencies between the bits) could be much more interesting
than regarding any particular snapshot.

The other huge multidimensional tree we have is the bug tracker
database. Several social science majors have already tried to do
something intelligible with the bug tracker data (and failed in my
opinion) so I am confident that someone who doesn't have that socially
oriented view of networks might be able to come up with more outrageous
and interesting viewpoints on how the bug tracker actually works and how
various bits of it interconnect, or doesn't work and don't connect,
respectively.


    jer




Reply | Threaded
Open this post in threaded view
|

Portage dependency solving algorithm (WAS: Regarding my final year thesis)

Jauhien Piatlicki-2
In reply to this post by Ciaran McCreesh-4
Hi,

On 11/06/2014 02:43 PM, Ciaran McCreesh wrote:

>
> If you're going to go the toolkit route, you should be using a CP
> solver, not a SAT solver. But even then you'd be better off making some
> changes and not using plain old MAC, so you're back to writing the
> algorithms yourself.
>
> What you need is for someone who understands CP and SAT to write a
> resolver using algorithms inspired by how CP and SAT solvers work, but
> not just blindly copying them. Doing this well is at least a full year
> Masters level project...
>
Yeah, you are right.

What I am interested in is an overview of what algorithm we are using
now. Do we have any documentation about it? As I really would like to
look at some concise document rather than sources.

Also may be we need to discuss how can we improve it, as at the moment
for me it seems one of the biggest problems with Gentoo. And afaik
paludis does not solve it (or am I wrong?)

--
Jauhien


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Regarding my final year thesis

Luca Barbato
In reply to this post by Harsh Bhatt
On 07/11/14 06:06, Harsh Bhatt wrote:

> This idea seems bit interesting, about how the bug tracker works.
> In this i just need to confirm that how much mathematical aspect
> can be included. It's a good idea to work on.


Also make might enjoy improvements.

lu

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm (WAS: Regarding my final year thesis)

Zac Medico-2
In reply to this post by Jauhien Piatlicki-2
On 11/07/2014 01:42 AM, Jauhien Piatlicki wrote:

> Hi,
>
> On 11/06/2014 02:43 PM, Ciaran McCreesh wrote:
>>
>> If you're going to go the toolkit route, you should be using a CP
>> solver, not a SAT solver. But even then you'd be better off making some
>> changes and not using plain old MAC, so you're back to writing the
>> algorithms yourself.
>>
>> What you need is for someone who understands CP and SAT to write a
>> resolver using algorithms inspired by how CP and SAT solvers work, but
>> not just blindly copying them. Doing this well is at least a full year
>> Masters level project...
>>
>
> Yeah, you are right.
>
> What I am interested in is an overview of what algorithm we are using
> now. Do we have any documentation about it? As I really would like to
> look at some concise document rather than sources.

If you install sys-apps/portage with USE=doc, it includes this
documentation which gives an overview of the portage's dependency
resolver algorithms:

        http://dev.gentoo.org/~zmedico/portage/doc/pt02.html

> Also may be we need to discuss how can we improve it, as at the moment
> for me it seems one of the biggest problems with Gentoo. And afaik
> paludis does not solve it (or am I wrong?)
--
Thanks,
Zac

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm (WAS: Regarding my final year thesis)

Ciaran McCreesh-4
In reply to this post by Jauhien Piatlicki-2
On Fri, 07 Nov 2014 10:42:39 +0100
Jauhien Piatlicki <[hidden email]> wrote:
> Also may be we need to discuss how can we improve it, as at the moment
> for me it seems one of the biggest problems with Gentoo. And afaik
> paludis does not solve it (or am I wrong?)

Paludis solves it. However, Paludis will only ever produce a correct
resolution, which can be a problem since ebuild dependencies are
often garbage...

--
Ciaran McCreesh

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm (WAS: Regarding my final year thesis)

Jauhien Piatlicki-2
On 11/07/2014 07:07 PM, Ciaran McCreesh wrote:

> On Fri, 07 Nov 2014 10:42:39 +0100
> Jauhien Piatlicki <[hidden email]> wrote:
>> Also may be we need to discuss how can we improve it, as at the moment
>> for me it seems one of the biggest problems with Gentoo. And afaik
>> paludis does not solve it (or am I wrong?)
>
> Paludis solves it. However, Paludis will only ever produce a correct
> resolution, which can be a problem since ebuild dependencies are
> often garbage...
>
Then the same question for you: where can one read about the algorithm
Paludis uses?

And, again, I have herd (did not try myself) that Paludis is as slow as
Portage.

--
Jauhien


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm (WAS: Regarding my final year thesis)

Ciaran McCreesh-4
On Fri, 07 Nov 2014 19:11:04 +0100
Jauhien Piatlicki <[hidden email]> wrote:
> Then the same question for you: where can one read about the algorithm
> Paludis uses?

It's basically a two stage process: simple constraint solving using
value ordering heuristics to enforce "don't do unnecessary work", then
ordering (which is not quite a graph process, and which is not as
simple as a topological sort, because the tree is full of circular
dependencies).

But the interesting question isn't "what's the algorithm?", it's
"what's the model?". That's where the complexity lies: figuring out how
to turn *DEPEND specifications into constraints is an utter pain, and
it isn't clean or easily understandable. The primary reason is ||
dependencies: developers like to write not-really-correct and utterly
unobvious dependency strings rather than asking for new syntax so they
can just say what they mean...

> And, again, I have herd (did not try myself) that Paludis is as slow
> as Portage.

Well, you're not comparing like with like. Paludis with "everything
turned off" does more than Portage with "everything turned on". If all
you're looking for is the wrong answer as fast as possible, there are
easier ways of getting it...

--
Ciaran McCreesh

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Matthias Maier-3
Am 07. Nov 2014, 19:30 schrieb Ciaran McCreesh <[hidden email]>:

> On Fri, 07 Nov 2014 19:11:04 +0100
> Jauhien Piatlicki <[hidden email]> wrote:
>> Then the same question for you: where can one read about the algorithm
>> Paludis uses?
>
> It's basically a two stage process: simple constraint solving using
> value ordering heuristics to enforce "don't do unnecessary work", then
> ordering (which is not quite a graph process, and which is not as
> simple as a topological sort, because the tree is full of circular
> dependencies).
>
> But the interesting question isn't "what's the algorithm?", it's
> "what's the model?". That's where the complexity lies: figuring out how
> to turn *DEPEND specifications into constraints is an utter pain, and
> it isn't clean or easily understandable. The primary reason is ||
> dependencies: developers like to write not-really-correct and utterly
> unobvious dependency strings rather than asking for new syntax so they
> can just say what they mean...


Currently, for portage just to decide that nothing has to be done on my
machine takes around 1 minute.

What is in your opinion the main reason for this? And how can we knock
this down to reasonable speed?

 - Is our dependency model that more complex than the problem resolvers
   of other package managers for other distributions solve?

 - Is it the algorithm that is implemented for the dependency model?

 - Is it its implementation?


>> And, again, I have herd (did not try myself) that Paludis is as slow
>> as Portage.
>
> Well, you're not comparing like with like. Paludis with "everything
> turned off" does more than Portage with "everything turned on". If all
> you're looking for is the wrong answer as fast as possible, there are
> easier ways of getting it...

The last time I compared the resolver speed of portage and paludis both
needed almost the same time.

Do you have a speed comparison with a similar feature set of both? (Or,
alternatively, the speedup one gains by tuning paludis to be as fast as
possible).

Best,
Matthias

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

hasufell
On 11/07/2014 07:54 PM, Matthias Maier wrote:

>> Well, you're not comparing like with like. Paludis with "everything
>> turned off" does more than Portage with "everything turned on". If all
>> you're looking for is the wrong answer as fast as possible, there are
>> easier ways of getting it...
>
> The last time I compared the resolver speed of portage and paludis both
> needed almost the same time.
>
> Do you have a speed comparison with a similar feature set of both? (Or,
> alternatively, the speedup one gains by tuning paludis to be as fast as
> possible).
>

I think you didn't get the idea: it doesn't make much sense to compare
the speed if the correctness differs.

Also, I don't understand these discussions. The time dependency
resolving takes is marginal compared to the whole update process, no
matter what PM you use.

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Ciaran McCreesh-4
In reply to this post by Matthias Maier-3
On Fri, 07 Nov 2014 19:54:08 +0100
Matthias Maier <[hidden email]> wrote:
> Currently, for portage just to decide that nothing has to be done on
> my machine takes around 1 minute.

Are you running with or without metadata cache? If you're running
without, it's going to be slow independently of the resolution
algorithm... If you're not:

>  - Is our dependency model that more complex than the problem
> resolvers of other package managers for other distributions solve?

Yes, massively so.

>  - Is it the algorithm that is implemented for the dependency model?

Also a contributing factor, for certain cases. You may see Portage doing
a lot of backtracking sometimes. There's a much better typical-case
algorithm for this.

>  - Is it its implementation?

Also a factor.

The main issue, though, is that getting a "good" resolution out of
crappy data is extremely difficult. There's the Babbage quote:

| On two occasions I have been asked, — "Pray, Mr. Babbage, if you put
| into the machine wrong figures, will the right answers come out?" In
| one case a member of the Upper, and in the other a member of the
| Lower, House put this question. I am not able rightly to apprehend
| the kind of confusion of ideas that could provoke such a question.

Yet this is *exactly* what a dependency resolver has to do for Gentoo,
and it's why dependency resolvers are so complicated.

(For comparison, Paludis on Exherbo will run an order of magnitude
faster for the same set of installed packages, simply because on
Exherbo the input is correct.)

> > Well, you're not comparing like with like. Paludis with "everything
> > turned off" does more than Portage with "everything turned on". If
> > all you're looking for is the wrong answer as fast as possible,
> > there are easier ways of getting it...
>
> The last time I compared the resolver speed of portage and paludis
> both needed almost the same time.

To do different things, though. Portage doesn't have a "produce a
correct resolution" switch. Paludis doesn't (really) have a "produce an
illegal resolution" switch. (Again, assuming you have metadata cache.
If you don't, whole other story.)

--
Ciaran McCreesh

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Jauhien Piatlicki-2
In reply to this post by hasufell
On 11/07/2014 08:08 PM, hasufell wrote:

> On 11/07/2014 07:54 PM, Matthias Maier wrote:
>>> Well, you're not comparing like with like. Paludis with "everything
>>> turned off" does more than Portage with "everything turned on". If all
>>> you're looking for is the wrong answer as fast as possible, there are
>>> easier ways of getting it...
>>
>> The last time I compared the resolver speed of portage and paludis both
>> needed almost the same time.
>>
>> Do you have a speed comparison with a similar feature set of both? (Or,
>> alternatively, the speedup one gains by tuning paludis to be as fast as
>> possible).
>>
>
> I think you didn't get the idea: it doesn't make much sense to compare
> the speed if the correctness differs.
>
> Also, I don't understand these discussions. The time dependency
> resolving takes is marginal compared to the whole update process, no
> matter what PM you use.
>
When it compiles in background after all dependencies was solved, it
needs no user intervention. But when I need to solve some blocks or do
some tests during maintaining work, the dependency solving time is what
I care about, as I need to wait for it and then investigate the results.


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Jauhien Piatlicki-2
In reply to this post by Ciaran McCreesh-4

On 11/07/2014 08:21 PM, Ciaran McCreesh wrote:

> The main issue, though, is that getting a "good" resolution out of
> crappy data is extremely difficult. There's the Babbage quote:
>
> | On two occasions I have been asked, — "Pray, Mr. Babbage, if you put
> | into the machine wrong figures, will the right answers come out?" In
> | one case a member of the Upper, and in the other a member of the
> | Lower, House put this question. I am not able rightly to apprehend
> | the kind of confusion of ideas that could provoke such a question.
>
> Yet this is *exactly* what a dependency resolver has to do for Gentoo,
> and it's why dependency resolvers are so complicated.
>
> (For comparison, Paludis on Exherbo will run an order of magnitude
> faster for the same set of installed packages, simply because on
> Exherbo the input is correct.)
>
What;s wrong with input? PMS itself or how do maintainers write ebuilds?
Could you explain?



signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Matthias Maier-3
In reply to this post by Ciaran McCreesh-4

Am 07. Nov 2014, 20:21 schrieb Ciaran McCreesh <[hidden email]>:

> On Fri, 07 Nov 2014 19:54:08 +0100
> Matthias Maier <[hidden email]> wrote:

>> Currently, for portage just to decide that nothing has to be done on
>> my machine takes around 1 minute.
>
> Are you running with or without metadata cache? If you're running
> without, it's going to be slow independently of the resolution
> algorithm...

Yes, I run with metadata cache. Without, ... well I never waited for it
to finish.

> [...]

Thank you very much for the detailed explanation. This helped a lot :-]

>
> (For comparison, Paludis on Exherbo will run an order of magnitude
> faster for the same set of installed packages, simply because on
> Exherbo the input is correct.)
>

This might be a problem that we can tackle, though...

Best,
Matthias

attachment0 (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

hasufell
In reply to this post by Jauhien Piatlicki-2
On 11/07/2014 08:56 PM, Jauhien Piatlicki wrote:

>>
>> I think you didn't get the idea: it doesn't make much sense to compare
>> the speed if the correctness differs.
>>
>> Also, I don't understand these discussions. The time dependency
>> resolving takes is marginal compared to the whole update process, no
>> matter what PM you use.
>>
>
> When it compiles in background after all dependencies was solved, it
> needs no user intervention. But when I need to solve some blocks or do
> some tests during maintaining work, the dependency solving time is what
> I care about, as I need to wait for it and then investigate the results.
>

I see, however... I prefer to have a correct answer instead of an
incorrect one, even if the correct one takes longer.

That goes _especially_ for testing and maintaining work.

Every time people compare portage to paludis I read stuff like "but
paludis is slower". That is incomplete information to put it diplomatic.

Do you really care so much about speed that you don't mind wrong results?

1234