Regarding my final year thesis

classic Classic list List threaded Threaded
74 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Jauhien Piatlicki-2
08.11.14 22:47, hasufell написав(ла):

> On 11/08/2014 10:30 PM, Rich Freeman wrote:
>> On Sat, Nov 8, 2014 at 2:48 PM, hasufell <[hidden email]> wrote:
>>> On 11/08/2014 08:32 PM, hasufell wrote:
>>>>> Sorry to chime in like that but if you don't mind, I'd like to ask for a
>>>>> real-life example for badly declared dependencies with a few words why
>>>>> those are bad and how to make them actually better?
>>>>>
>>>>
>>>> from dev-haskell/hashtables (note "hashtables" != "hashable"):
>>>> || ( ( >=dev-haskell/hashable-1.1:=[profile?]
>>>>        <dev-haskell/hashable-1.2:=[profile?] )
>>>>      ( >=dev-haskell/hashable-1.2.1:=[profile?]
>>>>        <dev-haskell/hashable-1.3:=[profile?] )
>>>>    )
>>>>
>>>> Latest stable version of dev-haskell/hashable is 1.2.1.0.
>>>> On a stable system (arch) the paludis dep-solver will try to match the
>>>> first group first and realize that there is also a stable version
>>>> 1.1.2.5 that matches that group. At that point there is a correct
>>>> solution, but since that involves downgrading a package, it will require
>>>> user-intervention, because it may not be what the user wants.
>>>> (this is the easy scenario... if downgrading causes blockers, you get
>>>> much more interesting output)
>>>>
>>>
>>> To be more specific... it is assumed that hashable-1.2.1.0 is already
>>> installed. Every time the dep solver runs through those packages without
>>> specifying what you want, you will hit the downgrade-problem.
>>
>> I'm missing the problem.  The package requires one of two ranges of
>> hashable versions.  One of them is already installed.  The dependency
>> is satisfied.
>>
>
> The one that is installed (1.2.1.0) is *excluded* by the first group,
> but there is a valid version that fits instead (1.1.2.5).
>
> That's the point where the assumptions start about what the depstring
> means and what the user wants.
>
So the problem is only with intervals? First of all, maintainer would specify higher interval first here and it would solve a problem with possible downgrading. Second, || is rather not for such cases as you've said, so we could ask for a new syntax and solve this problem in the right way in one of the next EAPIs.

Are there any other problems in current model apart from intervals? I would really like to see a list of them all.

--
Jauhien



signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

hasufell
On 11/08/2014 10:52 PM, Jauhien Piatlicki wrote:

> 08.11.14 22:47, hasufell написав(ла):
>> On 11/08/2014 10:30 PM, Rich Freeman wrote:
>>> On Sat, Nov 8, 2014 at 2:48 PM, hasufell <[hidden email]> wrote:
>>>> On 11/08/2014 08:32 PM, hasufell wrote:
>>>>>> Sorry to chime in like that but if you don't mind, I'd like to ask for a
>>>>>> real-life example for badly declared dependencies with a few words why
>>>>>> those are bad and how to make them actually better?
>>>>>>
>>>>>
>>>>> from dev-haskell/hashtables (note "hashtables" != "hashable"):
>>>>> || ( ( >=dev-haskell/hashable-1.1:=[profile?]
>>>>>        <dev-haskell/hashable-1.2:=[profile?] )
>>>>>      ( >=dev-haskell/hashable-1.2.1:=[profile?]
>>>>>        <dev-haskell/hashable-1.3:=[profile?] )
>>>>>    )
>>>>>
>>>>> Latest stable version of dev-haskell/hashable is 1.2.1.0.
>>>>> On a stable system (arch) the paludis dep-solver will try to match the
>>>>> first group first and realize that there is also a stable version
>>>>> 1.1.2.5 that matches that group. At that point there is a correct
>>>>> solution, but since that involves downgrading a package, it will require
>>>>> user-intervention, because it may not be what the user wants.
>>>>> (this is the easy scenario... if downgrading causes blockers, you get
>>>>> much more interesting output)
>>>>>
>>>>
>>>> To be more specific... it is assumed that hashable-1.2.1.0 is already
>>>> installed. Every time the dep solver runs through those packages without
>>>> specifying what you want, you will hit the downgrade-problem.
>>>
>>> I'm missing the problem.  The package requires one of two ranges of
>>> hashable versions.  One of them is already installed.  The dependency
>>> is satisfied.
>>>
>>
>> The one that is installed (1.2.1.0) is *excluded* by the first group,
>> but there is a valid version that fits instead (1.1.2.5).
>>
>> That's the point where the assumptions start about what the depstring
>> means and what the user wants.
>>
>
> So the problem is only with intervals? First of all, maintainer would specify higher interval first here and it would solve a problem with possible downgrading.

I have a feeling that this is an assumption as well. PMS just says this
is an 'any-of' group. There is not a single word about the processing
order of these specs or which one to prefer, in which case some is
better than the other and so on.

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Zac Medico-2
On 11/08/2014 02:05 PM, hasufell wrote:

> On 11/08/2014 10:52 PM, Jauhien Piatlicki wrote:
>> 08.11.14 22:47, hasufell написав(ла):
>>> On 11/08/2014 10:30 PM, Rich Freeman wrote:
>>>> On Sat, Nov 8, 2014 at 2:48 PM, hasufell <[hidden email]> wrote:
>>>>> On 11/08/2014 08:32 PM, hasufell wrote:
>>>>>>> Sorry to chime in like that but if you don't mind, I'd like to ask for a
>>>>>>> real-life example for badly declared dependencies with a few words why
>>>>>>> those are bad and how to make them actually better?
>>>>>>>
>>>>>>
>>>>>> from dev-haskell/hashtables (note "hashtables" != "hashable"):
>>>>>> || ( ( >=dev-haskell/hashable-1.1:=[profile?]
>>>>>>        <dev-haskell/hashable-1.2:=[profile?] )
>>>>>>      ( >=dev-haskell/hashable-1.2.1:=[profile?]
>>>>>>        <dev-haskell/hashable-1.3:=[profile?] )
>>>>>>    )
>>>>>>
>>>>>> Latest stable version of dev-haskell/hashable is 1.2.1.0.
>>>>>> On a stable system (arch) the paludis dep-solver will try to match the
>>>>>> first group first and realize that there is also a stable version
>>>>>> 1.1.2.5 that matches that group. At that point there is a correct
>>>>>> solution, but since that involves downgrading a package, it will require
>>>>>> user-intervention, because it may not be what the user wants.
>>>>>> (this is the easy scenario... if downgrading causes blockers, you get
>>>>>> much more interesting output)
>>>>>>
>>>>>
>>>>> To be more specific... it is assumed that hashable-1.2.1.0 is already
>>>>> installed. Every time the dep solver runs through those packages without
>>>>> specifying what you want, you will hit the downgrade-problem.
>>>>
>>>> I'm missing the problem.  The package requires one of two ranges of
>>>> hashable versions.  One of them is already installed.  The dependency
>>>> is satisfied.
>>>>
>>>
>>> The one that is installed (1.2.1.0) is *excluded* by the first group,
>>> but there is a valid version that fits instead (1.1.2.5).
>>>
>>> That's the point where the assumptions start about what the depstring
>>> means and what the user wants.
>>>
>>
>> So the problem is only with intervals? First of all, maintainer would specify higher interval first here and it would solve a problem with possible downgrading.
>
> I have a feeling that this is an assumption as well. PMS just says this
> is an 'any-of' group. There is not a single word about the processing
> order of these specs or which one to prefer, in which case some is
> better than the other and so on.

I think the two obvious algorithms are:

A) If the user's resolver parameters request maximum upgrades, then the
resolver should choose the choice that results the most upgrades.

B) If the user's resolver parameters request minimum change, then the
resolver should choose the choice which results in keeping the most
installed packages in place.

For the dev-haskell/hashtables scenario, the choice on the right wins
regardless of whether you're using algorithm A or algorithm B.
--
Thanks,
Zac

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Patrick Lauer
In reply to this post by hasufell
On 11/08/2014 10:48 PM, hasufell wrote:

> On 11/08/2014 02:24 PM, Patrick Lauer wrote:
>> On 11/08/2014 03:08 AM, hasufell wrote:
>>> On 11/07/2014 07:54 PM, Matthias Maier wrote:
>>>>> Well, you're not comparing like with like. Paludis with "everything
>>>>> turned off" does more than Portage with "everything turned on". If all
>>>>> you're looking for is the wrong answer as fast as possible, there are
>>>>> easier ways of getting it...
>>>>
>>>> The last time I compared the resolver speed of portage and paludis both
>>>> needed almost the same time.
>>>>
>>>> Do you have a speed comparison with a similar feature set of both? (Or,
>>>> alternatively, the speedup one gains by tuning paludis to be as fast as
>>>> possible).
>>>>
>>>
>>> I think you didn't get the idea: it doesn't make much sense to compare
>>> the speed if the correctness differs.
>>>
>>> Also, I don't understand these discussions. The time dependency
>>> resolving takes is marginal compared to the whole update process, no
>>> matter what PM you use.
>>>
>> *ahem*
>>
>> On my old notebook, which luckily suicided thanks to Lenovo's built in
>> obsolete device detection ...
>>
>> emerge -auNDv world took up to 35 minutes
>>
>> So, if something like RUBY_TARGETS or a random useflag changes, it takes
>> me literally DAYS to figure out a valid solution where portage can
>> figure out an upgrade path.
>>
>> No, it's not marginal.
>>
>
> So we are back to the initial discussion: fix the input instead of
> making the resolver even worse. You can't have both bad input and a fast
> _correct_ resolver.

... ... .... eeeeeeeeh?

If I'm telling portage to not enable mysql support, and some kde bits
through a transitive dependency need the mysql useflag enabled on
$whateverlib, then portage can't find a valid solution because of my
local decisions.

There's nothing gentoo can do in terms of policy or ebuild changes to
avoid that apart from removing all useflags and making me install debian
instead (just to find a variation of that problem with apt)

>
> So you choose between "good enough results which may not be what you
> want" and "pretty good results with lots of things to figure out
> yourself, because of the input and because the resolver does not make
> random assumptions about what you want".

We can choose for "code that works reasonably fast" - portage hasn't
gotten any noticeable work on performance in a while, and people have
just piled on more and more features and complexity. There's no reason
that it takes a minute to start up, so let's focus on fixing that
instead of trying to write a dependency resolution algorithm that
assumes the Riemann Hypothesis is correct.
>
> Both solutions s**k, tbh.
>
> I'd just like people to look at the whole picture and don't keep PM
> discussions narrow-minded by all this NIH crap.
>
It's not about NIH, it's about slow code being slow.


Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Zac Medico-2
On 11/08/2014 03:33 PM, Patrick Lauer wrote:
> We can choose for "code that works reasonably fast" - portage hasn't
> gotten any noticeable work on performance in a while, and people have
> just piled on more and more features and complexity.

Yes, as one of only 2 people who have worked on the resolver in recent
history, I agree with your statement. There have been lots of features
added (including EAPI 5 sub-slot rebuilds), with not much thought to
performance tuning. It will be interesting to do some profiling and see
what we can improve.

> There's no reason that it takes a minute to start up,

If it takes a minute then it probably means that
/var/cache/edb/vdb_metadata.pickle got deleted. In that case, it has to
reload lots of metadata from /var/db/pkg/*/*/*.
--
Thanks,
Zac

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

hasufell
In reply to this post by Zac Medico-2
On 11/08/2014 11:39 PM, Zac Medico wrote:

> On 11/08/2014 02:05 PM, hasufell wrote:
>> I have a feeling that this is an assumption as well. PMS just says this
>> is an 'any-of' group. There is not a single word about the processing
>> order of these specs or which one to prefer, in which case some is
>> better than the other and so on.
>
> I think the two obvious algorithms are:
>
> A) If the user's resolver parameters request maximum upgrades, then the
> resolver should choose the choice that results the most upgrades.
>

Neither the first nor the second dependency spec group in this example
leads to an upgrade.

> B) If the user's resolver parameters request minimum change, then the
> resolver should choose the choice which results in keeping the most
> installed packages in place.
>

I don't know of any such switch in portage or paludis (I may be wrong,
please point me to it unless you mean --nodeps). Whether I get minimum
change is a side effect of other choices and hardly predictable, afais, no?

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Zac Medico-2
On 11/08/2014 03:52 PM, hasufell wrote:

> On 11/08/2014 11:39 PM, Zac Medico wrote:
>> On 11/08/2014 02:05 PM, hasufell wrote:
>>> I have a feeling that this is an assumption as well. PMS just says this
>>> is an 'any-of' group. There is not a single word about the processing
>>> order of these specs or which one to prefer, in which case some is
>>> better than the other and so on.
>>
>> I think the two obvious algorithms are:
>>
>> A) If the user's resolver parameters request maximum upgrades, then the
>> resolver should choose the choice that results the most upgrades.
>>
>
> Neither the first nor the second dependency spec group in this example
> leads to an upgrade.

If you consider "not downgrading" equivalent to an upgrade, the then the
second dependency spec wins (the one on the "right").


>> B) If the user's resolver parameters request minimum change, then the
>> resolver should choose the choice which results in keeping the most
>> installed packages in place.
>>
>
> I don't know of any such switch in portage or paludis (I may be wrong,
> please point me to it unless you mean --nodeps).

For portage, absence of the emerge --update parameter will trigger a
"minimum change" behavior for packages that are not matched by argument
atoms.

> Whether I get minimum
> change is a side effect of other choices and hardly predictable, afais, no?

Consider it a "best effort" algorithm. Some other constraints may
override the minimum change request.
--
Thanks,
Zac

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

hasufell
In reply to this post by Patrick Lauer
On 11/09/2014 12:33 AM, Patrick Lauer wrote:
> It's not about NIH, it's about slow code being slow.
>

Can't disagree more. It's about wrong results, broken systems,
unreadable error messages, days of figuring out ruby, perl, haskell,
multilib, python blockers, incorrect autounmask suggestions and even
missing files from time to time (heard that other portage users hit that
too)...

I think you are approaching the problem from the wrong angle. These
problems are connected as has already been pointed out.

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Patrick Lauer
On 11/09/2014 08:04 AM, hasufell wrote:

> On 11/09/2014 12:33 AM, Patrick Lauer wrote:
>> It's not about NIH, it's about slow code being slow.
>>
>
> Can't disagree more. It's about wrong results, broken systems,
> unreadable error messages, days of figuring out ruby, perl, haskell,
> multilib, python blockers, incorrect autounmask suggestions and even
> missing files from time to time (heard that other portage users hit that
> too)...
>
> I think you are approaching the problem from the wrong angle. These
> problems are connected as has already been pointed out.
>

So you are saying everything is bad ...

... why are you still here then? :)

(Constructive criticism, it's a art!)

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

hasufell
On 11/09/2014 01:08 AM, Patrick Lauer wrote:

> On 11/09/2014 08:04 AM, hasufell wrote:
>> On 11/09/2014 12:33 AM, Patrick Lauer wrote:
>>> It's not about NIH, it's about slow code being slow.
>>>
>>
>> Can't disagree more. It's about wrong results, broken systems,
>> unreadable error messages, days of figuring out ruby, perl, haskell,
>> multilib, python blockers, incorrect autounmask suggestions and even
>> missing files from time to time (heard that other portage users hit that
>> too)...
>>
>> I think you are approaching the problem from the wrong angle. These
>> problems are connected as has already been pointed out.
>>
>
> So you are saying everything is bad ...
>

no.


Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Andrew Savchenko-2
In reply to this post by Zac Medico-2
On Sat, 08 Nov 2014 15:45:30 -0800 Zac Medico wrote:

> On 11/08/2014 03:33 PM, Patrick Lauer wrote:
> > We can choose for "code that works reasonably fast" - portage hasn't
> > gotten any noticeable work on performance in a while, and people have
> > just piled on more and more features and complexity.
>
> Yes, as one of only 2 people who have worked on the resolver in recent
> history, I agree with your statement. There have been lots of features
> added (including EAPI 5 sub-slot rebuilds), with not much thought to
> performance tuning. It will be interesting to do some profiling and see
> what we can improve.
>
> > There's no reason that it takes a minute to start up,
>
> If it takes a minute then it probably means that
> /var/cache/edb/vdb_metadata.pickle got deleted. In that case, it has to
> reload lots of metadata from /var/db/pkg/*/*/*.
On old hardware it may take dozens of minutes of CPU time. I have
that *.pickle files, I have sqlite metadata cache, I have 100% CPU
usage. It's not an I/O problem. Just take into account that due to
instruction set, Lx cache, frequency and memory speed difference
old CPU-based system may be 20x times slower than recent one.

Best regards,
Andrew Savchenko

attachment0 (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Andrew Savchenko-2
In reply to this post by Andrew Savchenko-2
On Sat, 8 Nov 2014 13:35:20 +0000 Ciaran McCreesh wrote:
> On Sat, 8 Nov 2014 13:56:21 +0300
> Andrew Savchenko <[hidden email]> wrote:
> > Wanna ~20-30 minutes with sqlite metadata cache enabled?
> > Try on Intel Atom N270 with 2800 packages installed.
> > Dependency resolution is utterly slow.
>
> Well I highly doubt Paludis will be *that* slow...

Last time I give it a try, it was even slower... May be "more
correct", though as far as I understand term "correctness" is
quite vague for current PMS specification. Just add factor ~30 speed
difference compared to recent Core i7 or Xeon or whatever.

Best regards,
Andrew Savchenko

attachment0 (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Zac Medico-2
In reply to this post by Andrew Savchenko-2
On 11/08/2014 10:53 PM, Andrew Savchenko wrote:

> On Sat, 08 Nov 2014 15:45:30 -0800 Zac Medico wrote:
>> On 11/08/2014 03:33 PM, Patrick Lauer wrote:
>>> We can choose for "code that works reasonably fast" - portage hasn't
>>> gotten any noticeable work on performance in a while, and people have
>>> just piled on more and more features and complexity.
>>
>> Yes, as one of only 2 people who have worked on the resolver in recent
>> history, I agree with your statement. There have been lots of features
>> added (including EAPI 5 sub-slot rebuilds), with not much thought to
>> performance tuning. It will be interesting to do some profiling and see
>> what we can improve.
>>
>>> There's no reason that it takes a minute to start up,
>>
>> If it takes a minute then it probably means that
>> /var/cache/edb/vdb_metadata.pickle got deleted. In that case, it has to
>> reload lots of metadata from /var/db/pkg/*/*/*.
>
> On old hardware it may take dozens of minutes of CPU time. I have
> that *.pickle files, I have sqlite metadata cache, I have 100% CPU
> usage. It's not an I/O problem. Just take into account that due to
> instruction set, Lx cache, frequency and memory speed difference
> old CPU-based system may be 20x times slower than recent one.

It could be useful for us to collect profiling data generated on old
hardware. If you'd like to help, you can use python's cProfile module to
generate statistics for us to analyze. The statistics can be nicely
visualized as a shaded call graph by using gprof2dot and graphviz [1].

[1]
http://grasswiki.osgeo.org/wiki/Tools_for_Python_programming#cProfile_profiling_tool
--
Thanks,
Zac

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Ciaran McCreesh-4
In reply to this post by Matthias Dahl
On Sat, 08 Nov 2014 20:01:54 +0100
Matthias Dahl <[hidden email]> wrote:
> Sorry to chime in like that but if you don't mind, I'd like to ask
> for a real-life example for badly declared dependencies with a few
> words why those are bad and how to make them actually better?

https://bugs.gentoo.org/show_bug.cgi?id=421497

For example. The key message is that developers aren't "writing what
they mean", they're writing "something that appears to work with
Portage if you don't look very hard".

--
Ciaran McCreesh

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Ciaran McCreesh-4
In reply to this post by Rich Freeman
On Sat, 8 Nov 2014 16:30:04 -0500
Rich Freeman <[hidden email]> wrote:
> if you have the second dep installed

Unfortunately the notion of "installed" is where most of the mess with
|| dependencies comes from...

What about "not installed yet, but will be installed during this
resolution"?

What about "an earlier version is installed, and will be upgraded
during this resolution"?

What about "an earlier version is installed, and we weren't going to
upgrade it, but we could"?

What about "a version is installed, but with the wrong USE flags"?

What about "a version in a different SLOT is installed"?

What about "it's installed, and we want to upgrade it, but selecting
this would lock us to an old version"?

Paludis has a *massive* list of scoring rules for these sorts of things
to try to do "the right thing" most of the time. Unfortunately there
are situations in the tree with identically-expressed dependencies
where doing one thing is the "right" answer in one case, and the other
thing is the "right" answer in the other.

One small step towards sanity is to stop using ( ) and || ( ) when the
intent is to select a single package and give a choice of how it's
installed (even if it means new syntax). The second step is to abolish
pretty much every use of ||.

--
Ciaran McCreesh

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Ciaran McCreesh-4
In reply to this post by Jauhien Piatlicki-2
On Sat, 08 Nov 2014 22:52:24 +0100
Jauhien Piatlicki <[hidden email]> wrote:
> So the problem is only with intervals?

No. Intervals are one of many examples.

> I would really like to see a list of them all.

That could be rather tricky... I doubt anyone has a complete list,
particularly since most of the crazy dependencies more or less work
most of the time. The way to find them is to ask every developer "have
you done something crazy with dependencies?", but that relies upon
developers recognising that trying to be clever is a bad thing.

--
Ciaran McCreesh

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Ciaran McCreesh-4
In reply to this post by Patrick Lauer
On Sun, 09 Nov 2014 07:33:44 +0800
Patrick Lauer <[hidden email]> wrote:
> instead of trying to write a dependency resolution algorithm that
> assumes the Riemann Hypothesis is correct.

Thank you for your contribution to this thread. I just realised that if
we assume the Generalised Riemann Hypothesis, we can reduce ordering to
knot equivalence, which would now be in co-NP. Please shower me with
more of your wisdom.

--
Ciaran McCreesh

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Andrew Savchenko
In reply to this post by Zac Medico-2
Hello,

On Sun, 09 Nov 2014 00:15:56 -0800 Zac Medico wrote:
> On 11/08/2014 10:53 PM, Andrew Savchenko wrote:
[...]

> > On old hardware it may take dozens of minutes of CPU time. I have
> > that *.pickle files, I have sqlite metadata cache, I have 100% CPU
> > usage. It's not an I/O problem. Just take into account that due to
> > instruction set, Lx cache, frequency and memory speed difference
> > old CPU-based system may be 20x times slower than recent one.
>
> It could be useful for us to collect profiling data generated on old
> hardware. If you'd like to help, you can use python's cProfile module to
> generate statistics for us to analyze. The statistics can be nicely
> visualized as a shaded call graph by using gprof2dot and graphviz [1].
Sorry for delay, I generated samples on two old hosts.

Tarball contains per host directories. Each one contains:
- pstats file;
- generated pdf with call graphs and timing;
- host-related information:
  * emerge --info
  * /proc/cpuinfo
  * /proc/memnifo

*.pstats and *.pdf filename should describe command unambiguously,
e.g. emerge-pv_python:2.7_python:3.3-python-3.3.5-r1 means:
emerge -pv python:2.7 python:3.3 while using python-3.3.5-r1 as
interpreter.

hitomi system was not fully updated for a bit more than a year,
while another one for about half a year. So dependency calculations
may be of different intencities. Sets of packages installed are
similar but not the same:
2502 on hitomi
2953 on desktop

I understand that most interesting data will be from emerge -DNupv
output, but to obtain it I must first fix all blockers and
conflicts, this really takes a lot of time, so please be patient.

And last but not the list: on both systems portage is configured to
use sqlite3 cache for metadata, so there was no I/O delays (CPU
usage about 100% according to top).

In order not to abuse mail list with large attachments, data is
available here:
ftp://brunestud.info/gentoo/portage.tar.xz

Best regards,
Andrew Savchenko

attachment0 (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Zac Medico-2
On 11/17/2014 07:07 PM, Andrew Savchenko wrote:

> Hello,
>
> On Sun, 09 Nov 2014 00:15:56 -0800 Zac Medico wrote:
>> On 11/08/2014 10:53 PM, Andrew Savchenko wrote:
> [...]
>>> On old hardware it may take dozens of minutes of CPU time. I have
>>> that *.pickle files, I have sqlite metadata cache, I have 100% CPU
>>> usage. It's not an I/O problem. Just take into account that due to
>>> instruction set, Lx cache, frequency and memory speed difference
>>> old CPU-based system may be 20x times slower than recent one.
>>
>> It could be useful for us to collect profiling data generated on old
>> hardware. If you'd like to help, you can use python's cProfile module to
>> generate statistics for us to analyze. The statistics can be nicely
>> visualized as a shaded call graph by using gprof2dot and graphviz [1].
>
> Sorry for delay, I generated samples on two old hosts.
>
> Tarball contains per host directories. Each one contains:
> - pstats file;
> - generated pdf with call graphs and timing;
> - host-related information:
>   * emerge --info
>   * /proc/cpuinfo
>   * /proc/memnifo
>
> *.pstats and *.pdf filename should describe command unambiguously,
> e.g. emerge-pv_python:2.7_python:3.3-python-3.3.5-r1 means:
> emerge -pv python:2.7 python:3.3 while using python-3.3.5-r1 as
> interpreter.

Thank you for this data. I will see what I can to do optimize the
problem areas that your data highlights.

> hitomi system was not fully updated for a bit more than a year,
> while another one for about half a year. So dependency calculations
> may be of different intencities. Sets of packages installed are
> similar but not the same:
> 2502 on hitomi

For hitomi, _slot_operator_update_probe/use_reduce is an obvious thing
to optimize. It called use_reduce 53763 times there, so it seems to
repeat use_reduce multiple times for the same packages. That means we
should see a benefit from memoization.

> 2953 on desktop

For desktop, _dynamic_deps_preload is an obvious thing to optimize. You
can avoid this function entirely if you use --dynamic-deps=n. You may
need to run 'emerge @changed-deps' in order to ensure that emerge will
be able to cope with --dynamic-deps=n. IIRC you need at least
sys-apps/portage-2.2.14 for @changed-deps support.

--
Thanks,
Zac

Reply | Threaded
Open this post in threaded view
|

Re: Portage dependency solving algorithm

Andrew Savchenko
Hi,

On Mon, 17 Nov 2014 19:56:02 -0800 Zac Medico wrote:
> On 11/17/2014 07:07 PM, Andrew Savchenko wrote:
[...]

> > Tarball contains per host directories. Each one contains:
> > - pstats file;
> > - generated pdf with call graphs and timing;
> > - host-related information:
> >   * emerge --info
> >   * /proc/cpuinfo
> >   * /proc/memnifo
> >
> > *.pstats and *.pdf filename should describe command unambiguously,
> > e.g. emerge-pv_python:2.7_python:3.3-python-3.3.5-r1 means:
> > emerge -pv python:2.7 python:3.3 while using python-3.3.5-r1 as
> > interpreter.
>
> Thank you for this data. I will see what I can to do optimize the
> problem areas that your data highlights.
>
> > hitomi system was not fully updated for a bit more than a year,
> > while another one for about half a year. So dependency calculations
> > may be of different intencities. Sets of packages installed are
> > similar but not the same:
> > 2502 on hitomi
>
> For hitomi, _slot_operator_update_probe/use_reduce is an obvious thing
> to optimize. It called use_reduce 53763 times there, so it seems to
> repeat use_reduce multiple times for the same packages. That means we
> should see a benefit from memoization.
>
> > 2953 on desktop
>
> For desktop, _dynamic_deps_preload is an obvious thing to optimize. You
> can avoid this function entirely if you use --dynamic-deps=n. You may
> need to run 'emerge @changed-deps' in order to ensure that emerge will
> be able to cope with --dynamic-deps=n. IIRC you need at least
> sys-apps/portage-2.2.14 for @changed-deps support.
 
I use 2.2.14 on both hosts (and usually latest ~x86 portage is
there). I thought that running fixpackages should be enough to run
emerge with --dynamic-deps=n.

I'm afraid I will not be able to run emerge @changed-deps prior to
@world update due to too old packages installed (e.g. versions not
in tree anymore), blockers and unsatisfied deps.

When I'll manage to run emerge -DNupv @world without errors, I'll
send you stats for both runs with and without dynamic deps.

By the way, do you need pstats files (e.g. for some extra data) or
pdf graphs are sufficient?

Best regards,
Andrew Savchenko

attachment0 (836 bytes) Download Attachment
1234