Constraint-Based Dependency Solver for Portage: a prototype

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Constraint-Based Dependency Solver for Portage: a prototype

Michael Lienhardt
Dear Portage developers,

I am a Post-doc in formal methods and software engineering. With my colleagues, we are working on a formal model for software composition, and were looking for a concrete example of such model to motivate and guide our work. I knew portage from using gentoo since 2007, and knew that it is the perfect use case for us.

The first result of our work is a prototype for a constraint-based dependency solver for Portage:
  like the emerge tool, it takes in parameter a list of atoms to install, and computes a full list of packages to install to satisfy the package dependency relation.
Up to bugs, this tool is correct and complete: it will always find a solution if it exists, and always tell if there are none.
For instance, it successfully computed that gnome-base/gnome cannot be installed by default (on a udev system), but found a solution that replaces sys-fs/eudev by sys-apps/systemd when we allow the tool to change the USE flag selection of the packages.

With this prototype, we also compiled (90% of) a documentation on how portage manages package configuration (USE flags declaration, selection, masking, keywording, ...).

Link to the prototype: https://github.com/HyVar/gentoo_to_mspl
Link to the documentation: https://github.com/HyVar/gentoo_to_mspl/blob/master/PORTAGE.md


We would really like to know your opinions, impressions and suggestions about this work.
We would also like to know how useful this tool could be for the community:
  as for now, it is a prototype of a dependency solver (that would definitively need some work to be usable in production), but it also offers the possibility of any kind of formal analysis on the REQUIRED_USE and dependencies in packages, like the one described in https://bugs.gentoo.org/417753
For instance, our tool already checks for obvious reasons (inconsistent REQUIRED_USE or unmet dependencies) causing a package not to be installable. In particular, on the Portage version available in http://www.osboxes.org/gentoo/ , our tool identified 14 packages that could not be installed for these reasons (the full list in in post-scriptum).


Additionally, our implementation is based on what I understood of the portage's documentation, which I compiled in the PORTAGE.md document: it would be very helpful if you could point error that I made or subtleties that I didn't understand or missed.

Best Regards,
Michael Lienhardt


PS: list of uninstallable packages:
  dev-java/jruby-1.7.12
  media-video/nvidia-settings-340.58
  dev-ruby/bitescript-0.0.9
  dev-java/spring-core-3.2.4
  app-i18n/ibus-table-code-1.2.0.20100305
  dev-ruby/weakling-0.0.4
  sci-libs/ogdi-3.1.5-r1
  dev-java/jcs-2.0
  net-misc/asterisk-rate_engine-0.5.4
  games-fps/doom3-mitm-20070129
  app-office/impressive-0.10.5
  dev-java/spring-aop-3.2.4
  dev-ruby/duby-0.0.2-r1
  dev-db/mycli-9999

Reply | Threaded
Open this post in threaded view
|

Re: Constraint-Based Dependency Solver for Portage: a prototype

Alexander Berntsen-2
Unfortunately I am way too busy to even entertain looking into this in
any detail. A few years ago I was hoping to work on something like this,
though in Agda (or possibly in Haskell or Coq), and do work on making
Portage much more modular so that you could actually pick whatever
dependency resolver you wanted -- even going as far as "Don't like the
resolution you were presented now? Try using another resolver" messages.
But alas that job ended before I got very far.

However, I wanted to take the time to drop you a tiny note of
encouragement. I think this looks really great, and I hope someone on
the team finds the time to look into it in more detail than me.
--
Alexander
[hidden email]
https://secure.plaimi.net/~alexander


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re : Re: [gentoo-portage-dev] Constraint-Based Dependency Solver for Portage: a prototype

Michael Lienhardt
Dear Alexander,

Many thanks for your reply and your encouragements.
The point that you raised is very interesting and was partially done in Debian (they defined a wrapper around apt-get instead of refactoring it): http://manpages.ubuntu.com/manpages/zesty/man8/apt-cudf-get.8.html
Part of their work was formalized in coq and implemented in OCaml.
In our case, we don't have any mechanized formalization of our model (maybe in the future).

I too (and my colleagues) hope that someone on the team could have some time to look into our project.
But maybe there are things we can do to help start a dialog, like:
 - reaching in other mailing lists
 - posting on a Gentoo forum
 - participating in a workshop/conference/other where we could directly meet and discuss with the community
 - or simply starting an informal discussion by email where instead of having to look into the Github repository, you could directly ask me

Does anyone have suggestions on that topic?

Again, many thanks.
I really hope that with everyone's feedback, suggestions, and help, we could make something useful from this prototype.


Michael Lienhardt

PS: I forgot in my previous mail to talk about the other persons involved in this project:
 - Jacopo Mauro, Post-doc in UiO (Norway), developer of the solver backend
 - Simone Donetti, Engineer in Unito (Italy), he helped me perform some tests
 - Ferruccio Damiani (Unito), Einar Broch Johnsen and Ingrid Chieh Yu (UiO), our supervisors



Reply | Threaded
Open this post in threaded view
|

Re: Re : Re: [gentoo-portage-dev] Constraint-Based Dependency Solver for Portage: a prototype

Alexander Berntsen-2
On 13/12/17 02:52, [hidden email] wrote:
> But maybe there are things we can do to help start a dialog, like:
>  - reaching in other mailing lists
I don't think a post to gentoo-dev would be remiss in this case.

>  - posting on a Gentoo forum
Always useful, I'm told, though I don't venture there. But that way
you're far more likely to engage *users*.

>  - participating in a workshop/conference/other where we could directly meet and discuss with the community
FOSDEM and Linux Days are probably the best choices.

>  - or simply starting an informal discussion by email where instead of having to look into the Github repository, you could directly ask me
If someone has the time, that'll probably naturally happen through the
MLs. Christmas time tends to be peak bikeshedding hours at Gentoo, so
maybe cross-post to -dev closer to the holidays?
--
Alexander
[hidden email]
https://secure.plaimi.net/~alexander


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Re : Re: [gentoo-portage-dev] Constraint-Based Dependency Solver for Portage: a prototype

Michael Lienhardt
Dear Alexander,

Many thanks for the advices.
I started few discussions on the forum and will go to FOSDEM.
I'll see where it will go.

Best,
Michael

Il 16/12/2017 14:39, Alexander Berntsen ha scritto:

> On 13/12/17 02:52, [hidden email] wrote:
>> But maybe there are things we can do to help start a dialog, like:
>>   - reaching in other mailing lists
> I don't think a post to gentoo-dev would be remiss in this case.
>
>>   - posting on a Gentoo forum
> Always useful, I'm told, though I don't venture there. But that way
> you're far more likely to engage *users*.
>
>>   - participating in a workshop/conference/other where we could directly meet and discuss with the community
> FOSDEM and Linux Days are probably the best choices.
>
>>   - or simply starting an informal discussion by email where instead of having to look into the Github repository, you could directly ask me
> If someone has the time, that'll probably naturally happen through the
> MLs. Christmas time tends to be peak bikeshedding hours at Gentoo, so
> maybe cross-post to -dev closer to the holidays?
>

Reply | Threaded
Open this post in threaded view
|

Re: Re : Re: [gentoo-portage-dev] Constraint-Based Dependency Solver for Portage: a prototype

Alexander Berntsen-2
On 08/01/18 03:06, Michael Lienhardt wrote:
> Many thanks for the advices. I started few discussions on the forum
> and will go to FOSDEM. I'll see where it will go.
Consider mentioning this in #gentoo-fosdem on irc.freenode.net, which is
where the Gentoo devs & users who are going to FOSDEM generally gather.
Maybe you can meet up with everyone. There's usually a few Gentoo
dinners (one actual "Gentoo dinner" that everyone sign sup to, and just
informal dinners throughout the weekend as well), and an OpenPGP
key-signing event, for instance.

Good luck, and enjoy FOSDEM.
--
Alexander
[hidden email]
https://secure.plaimi.net/~alexander


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Constraint-Based Dependency Solver for Portage: again

Michael Lienhardt
In reply to this post by Michael Lienhardt
Dear all,

A few months ago, I got back to my constraint-based dependency solver for portage, that I had to leave for a while.
Thanks to Zac Medico, it is now based on portage itself to query the portage tree, and so the code is far simpler (and far less buggy).
It is on github: https://github.com/gzoumix/pdepa

I still have some work to do on the implementation, and with some colleagues, we are planning to publish it in a conference, with the related theory.
However, to have relevant information to publish, I need your help, if you could answer some questions that will come up during testing.
For instance, in all my tests, emerge (during its dependency resolution) always replaces atoms with the latest version of the pc that matches it, even with all possible backtracking options being selected
 (I noticed this behavior because emerge failed installing a package such that the latest corresponding cpv could be installed, while the previous version could be).
Is it really the default behavior of emerge, and if yes, is there a way to make emerge consider all matching cpv for an atom?

Thank you!
Michael

Reply | Threaded
Open this post in threaded view
|

Re: Constraint-Based Dependency Solver for Portage: again

Zac Medico-2
On 6/20/19 12:02 PM, [hidden email] wrote:
> Dear all,
>
> A few months ago, I got back to my constraint-based dependency solver for portage, that I had to leave for a while.
> Thanks to Zac Medico, it is now based on portage itself to query the portage tree, and so the code is far simpler (and far less buggy).
> It is on github: https://github.com/gzoumix/pdepa

Great!

> I still have some work to do on the implementation, and with some colleagues, we are planning to publish it in a conference, with the related theory.
> However, to have relevant information to publish, I need your help, if you could answer some questions that will come up during testing.
> For instance, in all my tests, emerge (during its dependency resolution) always replaces atoms with the latest version of the pc that matches it, even with all possible backtracking options being selected
>  (I noticed this behavior because emerge failed installing a package such that the latest corresponding cpv could be installed, while the previous version could be).
> Is it really the default behavior of emerge, and if yes, is there a way to make emerge consider all matching cpv for an atom?

It's capable of considering older versions, but maybe there's some
deficiency in the algorithm. We should analyze a specific example in
order to understand the behavior.

Maybe you're referring to this code which forces the highest version in
the event of a conflict:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=a9064d08ef4c92a5d0d1bfb3dc8a01b7850812b0

> Thank you!
> Michael
>
--
Thanks,
Zac


signature.asc (1000 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re : Re: [gentoo-portage-dev] Constraint-Based Dependency Solver for Portage: again

Michael Lienhardt

----- Zac Medico <[hidden email]> a écrit :
> It's capable of considering older versions, but maybe there's some
> deficiency in the algorithm. We should analyze a specific example in
> order to understand the behavior.
>
> Maybe you're referring to this code which forces the highest version in
> the event of a conflict:
>
> https://gitweb.gentoo.org/proj/portage.git/commit/?id=a9064d08ef4c92a5d0d1bfb3dc8a01b7850812b0
>

Yes, this seems to be the cause of the problem, thank you.

For testing, I created two ebuilds (and tested with "emerge -pv --autounmask-backtrack y net-misc/pdepa"):

## net-misc/pdepa-1.0
EAPI=6
KEYWORDS="amd64"
SLOT="1"
IUSE="feature"
REQUIRED_USE=""
DEPEND=""

## net-misc/pdepa-2.0
EAPI=6
KEYWORDS="amd64"
SLOT="1"
IUSE="feature"
REQUIRED_USE="^^ ( feature )" # feature is not set => not installable
DEPEND=""

Like you said, due to SLOT conflict, only net-misc/pdepa-2.0 is tried, and emerge fails.
Changing the SLOT of net-misc/pdepa-2.0 to 2 "solves the issue": emerge succeeds and propose to install net-misc/pdepa-1.0

However, this solution is only local: if in net-misc/pdepa-2.0 I replace
REQUIRED_USE="" # now the package has no configuration problem
DEPEND="!virtual/libc" # but it conflicts with an important library

then emerge fails again, saying that virtual/libc blocks net-misc/pdepa-2.0


I don't know how many actual packages cannot be installed due to this optimization.
I noticed this behavior in a previous version of the portage tree, when trying to install sys-auth/polkit without configuring it:
 - old version: REQUIRED_USE="??( systemd consolekit )"
 - more recent version REQUIRED_USE="^^ ( consolekit, elogind systemd )"
In practice however, this was not a problem, as some dependencies of polkit deep in the tree did require one of ( systemd consolekit ) to be set.

If you want, I can implement a test that check if this optimization is a problem in practice.
Checking the issue shouldn't be too difficult (I think I just need to check that all REQUIRED_USE and *DEPEND are equivalent for a SLOT).

Best,
Michael

Reply | Threaded
Open this post in threaded view
|

Constraint-Based Dependency Solver: initial results

Michael Lienhardt
Dear all,

It's possible that the goal of my current work and its possible benefit for the gentoo community are not very clear.
In my experience, emerge is a very good tool to install new packages whose use flags have already been configured.
However, when the packages are not correctly configured, emerge can guess, without guarantee, some use flags to select/unselect; and unmerging a package seems to be the user's entire responsability.
My work has several goals:
 - provide a correct and complete implementation of a dependency solver for portage that can help debug emerge
 - since the solver is correct and complete, it would provide a "safe" implementation of unmerge, where missing dependencies would be satisfied by the installation of new packages
 - provide a rather small code base that is easy to debug and on which it is easy to prototype some tools, like REQUIRED_USE checks, or interactive use flag configuration
 - be usable, both in term of memory usage and computation time.

I finished implementing the solver and did some preliminary benchs (1000 tests where I checked if a random set of packages -- different for each test -- could be installed), including comparison with emerge in "search" modality, i.e., with the options --autounmask y --autounmask-continue y --autounmask-backtrack y
Since this is preliminary, I wanted to talk to you about it but I don't have every bugs identified yet.
In average, my solver is a bit less than 10 times slower than emerge (not very good, but not bad as it is still far better than a brute force approach, which is more than 100 times slower and takes 3Gb of memory).
It is important to note that my solver is not suited for end user usage yet (the answer it gives is correct, but random -- it includes useless packages, useless package removal and old versions), but it is the first tool that I know of that can correctly and completely (modulo bugs ^^) check emerge's behavior.

A first result: in more than 25% of the tests that can be installed, emerge fails.
Most of these failures come from the fact that even in "search" modality, emerge stops when it sees a REQUIRED_USE that is not correctly configured (I'll post a bug report about it and about the SLOT heuristics in a few days).
I still need to look at the other failures to see what caused them.

Additionally, it seems that when I do "emerge package1 package2", emerge first solves the dependencies of package1, and then of package2.
It seems that when resolving the depenendencies of package2, emerge can end up with a conflict between the choices it made for solving the dependencies of package1 and the requirements of package2.
I say "seems" because my tests were automatically done with a rather long list of packages (so other reasons for the observed failures are possible).
I'll try to pin down the actual emerge behavior and possibly file a bug report.

I am currently porting the tests on docker (to have a safe testing environment) and once that's done, I will be able to give actual bug reports.
In the future, I have the following plan:
 - cleanup the output of the solver, so it wouldn't be random and be usable instead (this is "just" a technical and boring algorithm to implement, but time consuming)
 - install the configuration computed by my solver (I am still unsure that emerge --nodeps would be flexible enough, and maybe I will have to implement my own planner)
 - find a correct abstraction of packages, similar to the SLOT heuristics ( https://gitweb.gentoo.org/proj/portage.git/commit/?id=a9064d08ef4c92a5d0d1bfb3dc8a01b7850812b0 ), to improve efficiency
 - design an interactive package configuration algorithm + UI that would happen during the dependency solving process and really help the user configuring what he wants and let the solver do the rest
 - start reading portage's bug tracker and continue reading its code
 - extend pdepa with other kind of relevant analysis

All comments/suggestions are welcomed.

Best,
Michael

Reply | Threaded
Open this post in threaded view
|

Re: Constraint-Based Dependency Solver: initial results

Michael Orlitzky
On 8/30/19 10:34 AM, [hidden email] wrote:
>
> All comments/suggestions are welcomed.
>

Since no one else has said it yet (?), I think this approach is really
cool and I'm glad someone is working on it.

Modeling difficult computations as abstract optimization problems is a
"hobby" of mine, and I think it will pay off if we can abstract away a
big ugly part of the PM and try to swap it out for something more
principled.

Reply | Threaded
Open this post in threaded view
|

Re: Constraint-Based Dependency Solver: initial results

Zac Medico-2
In reply to this post by Michael Lienhardt
On 8/30/19 7:34 AM, [hidden email] wrote:
>  - install the configuration computed by my solver (I am still unsure that emerge --nodeps would be flexible enough, and maybe I will have to implement my own planner)

One problem with emerge --nodeps is that it does not resolve file
collisions in cases where two packages block each other. Normally, such
collisions are resolved by removing the files from the contents of the
installed package (the installed package is ultimately unmerged in order
to solve the blocker), but with --nodeps the blockers are simply
discarded and emerge treats all file collisions as fatal.

Another problem is that --nodeps eliminates the dependency information
that's needed for scheduling parallel builds with emerge --jobs.
--
Thanks,
Zac


signature.asc (1000 bytes) Download Attachment