[RFC] Adding extra vars to md5-cache, for QA&tooling purposes

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC] Adding extra vars to md5-cache, for QA&tooling purposes

Michał Górny-5
Hi,

TL;DR: I'd like to make it possible for ebuilds to define additional
variables that will be stored in md5-cache.  This would be useful for CI
and other tooling that right now has to parse ebuilds for other data.


The idea is to add a new incremental ebuild/eclass variable (technical
name: QA_EXTRA_CACHE_VARS) that would define additional data to be
stored in cache.  For example, python*-r1 eclasses would define
'PYTHON_COMPAT', acct-user would define 'ACCT_USER_ID', etc.

When regenerating cache, the PM would read this variable, and store
the values of all defined variables into md5-cache.  As a result,
programs needing those variables can get them straight from cache
without having to attempt to run or parse ebuilds (which is both slow
and prone to bugs).

This would benefit e.g. gpyutils that right now need to attempt to parse
PYTHON_COMPAT from ebuilds.  It would also benefit writing future
pkgcheck checks for user/group ID collisions.


Notes:

- since md5-cache uses key-value format and allows for future
extensions, the new values can be added without breaking anything;

- md5-cache is not specified in the PMS, and the whole thing can be
implemented without need for EAPI bump,

- I would like to have this implemented consistently both in Portage
and pkgcore,

- we will need to clearly define how to dump arrays.


What do you think?

--
Best regards,
Michał Górny


signature.asc (631 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding extra vars to md5-cache, for QA&tooling purposes

Zac Medico-2
On 7/25/19 5:20 AM, Michał Górny wrote:

> Hi,
>
> TL;DR: I'd like to make it possible for ebuilds to define additional
> variables that will be stored in md5-cache.  This would be useful for CI
> and other tooling that right now has to parse ebuilds for other data.
>
>
> The idea is to add a new incremental ebuild/eclass variable (technical
> name: QA_EXTRA_CACHE_VARS) that would define additional data to be
> stored in cache.  For example, python*-r1 eclasses would define
> 'PYTHON_COMPAT', acct-user would define 'ACCT_USER_ID', etc.
>
> When regenerating cache, the PM would read this variable, and store
> the values of all defined variables into md5-cache.  As a result,
> programs needing those variables can get them straight from cache
> without having to attempt to run or parse ebuilds (which is both slow
> and prone to bugs).
>
> This would benefit e.g. gpyutils that right now need to attempt to parse
> PYTHON_COMPAT from ebuilds.  It would also benefit writing future
> pkgcheck checks for user/group ID collisions.
>
>
> Notes:
>
> - since md5-cache uses key-value format and allows for future
> extensions, the new values can be added without breaking anything;
>
> - md5-cache is not specified in the PMS, and the whole thing can be
> implemented without need for EAPI bump,
>
> - I would like to have this implemented consistently both in Portage
> and pkgcore,
>
> - we will need to clearly define how to dump arrays.
>
>
> What do you think?
Sounds good. Some thoughts:

* Maybe omit QA from the variable name, since it can be could be
generally useful for things that are unrelated to QA.

* In the md5-cache entry, maybe use a common prefix like EXT_ for the
extra keys in order to distinguish them from normal keys.
--
Thanks,
Zac


signature.asc (1000 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding extra vars to md5-cache, for QA&tooling purposes

Michał Górny-5
On Thu, 2019-07-25 at 12:57 -0700, Zac Medico wrote:

> On 7/25/19 5:20 AM, Michał Górny wrote:
> > Hi,
> >
> > TL;DR: I'd like to make it possible for ebuilds to define additional
> > variables that will be stored in md5-cache.  This would be useful for CI
> > and other tooling that right now has to parse ebuilds for other data.
> >
> >
> > The idea is to add a new incremental ebuild/eclass variable (technical
> > name: QA_EXTRA_CACHE_VARS) that would define additional data to be
> > stored in cache.  For example, python*-r1 eclasses would define
> > 'PYTHON_COMPAT', acct-user would define 'ACCT_USER_ID', etc.
> >
> > When regenerating cache, the PM would read this variable, and store
> > the values of all defined variables into md5-cache.  As a result,
> > programs needing those variables can get them straight from cache
> > without having to attempt to run or parse ebuilds (which is both slow
> > and prone to bugs).
> >
> > This would benefit e.g. gpyutils that right now need to attempt to parse
> > PYTHON_COMPAT from ebuilds.  It would also benefit writing future
> > pkgcheck checks for user/group ID collisions.
> >
> >
> > Notes:
> >
> > - since md5-cache uses key-value format and allows for future
> > extensions, the new values can be added without breaking anything;
> >
> > - md5-cache is not specified in the PMS, and the whole thing can be
> > implemented without need for EAPI bump,
> >
> > - I would like to have this implemented consistently both in Portage
> > and pkgcore,
> >
> > - we will need to clearly define how to dump arrays.
> >
> >
> > What do you think?
>
> Sounds good. Some thoughts:
>
> * Maybe omit QA from the variable name, since it can be could be
> generally useful for things that are unrelated to QA.
>
> * In the md5-cache entry, maybe use a common prefix like EXT_ for the
> extra keys in order to distinguish them from normal keys.
Yeah, I was thinking of something like '__ext_foo', or '__ext[foo]'.

--
Best regards,
Michał Górny


signature.asc (631 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding extra vars to md5-cache, for QA&tooling purposes

Michael Orlitzky
On 7/25/19 4:29 PM, Michał Górny wrote:
>>
>> * In the md5-cache entry, maybe use a common prefix like EXT_ for the
>> extra keys in order to distinguish them from normal keys.
>
> Yeah, I was thinking of something like '__ext_foo', or '__ext[foo]'.
>

What are the pros/cons of this? The names refer to global variables, so
they should already be safely namespaced, right?.

There is a possibility that an eclass variable name (e.g. PATCHES) could
become standardized at a later date. If that happens, we could wind up
with both FOO and __ext_FOO in the cache, and tools would have to figure
out what to do with zero, one, or both present. (This has happened in
email/web protocols when an X-Foo header was standardized.) It's not the
end of the world, but someone would have to stop and think about it.

Finally, just having the name be predictable so that I can grep '^FOO='
without having to care where it came from is nice.

OTOH for testing, and for figuring out why these weird variables are
showing up in my cache, the prefix would help.


Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding extra vars to md5-cache, for QA&tooling purposes

Fabian Groffen-2
In reply to this post by Michał Górny-5
Hi,

On 25-07-2019 14:20:50 +0200, Michał Górny wrote:
> Hi,
>
> TL;DR: I'd like to make it possible for ebuilds to define additional
> variables that will be stored in md5-cache.  This would be useful for CI
> and other tooling that right now has to parse ebuilds for other data.

Only downside I can think of, is a diskspace increase for the md5-cache.
Not sure if this is going to be substantial, but given things like
PYTHON_COMPAT, perhaps a quick calculation of extra "cost" can be made.
Should diskspace become a problem, one could consider to use a separate
file/dir, that users could rsync-exclude, since Portage won't need it to
operate properly.

Thanks,
Fabian

>
>
> The idea is to add a new incremental ebuild/eclass variable (technical
> name: QA_EXTRA_CACHE_VARS) that would define additional data to be
> stored in cache.  For example, python*-r1 eclasses would define
> 'PYTHON_COMPAT', acct-user would define 'ACCT_USER_ID', etc.
>
> When regenerating cache, the PM would read this variable, and store
> the values of all defined variables into md5-cache.  As a result,
> programs needing those variables can get them straight from cache
> without having to attempt to run or parse ebuilds (which is both slow
> and prone to bugs).
>
> This would benefit e.g. gpyutils that right now need to attempt to parse
> PYTHON_COMPAT from ebuilds.  It would also benefit writing future
> pkgcheck checks for user/group ID collisions.
>
>
> Notes:
>
> - since md5-cache uses key-value format and allows for future
> extensions, the new values can be added without breaking anything;
>
> - md5-cache is not specified in the PMS, and the whole thing can be
> implemented without need for EAPI bump,
>
> - I would like to have this implemented consistently both in Portage
> and pkgcore,
>
> - we will need to clearly define how to dump arrays.
>
>
> What do you think?
>
> --
> Best regards,
> Michał Górny
>


--
Fabian Groffen
Gentoo on a different level

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Adding extra vars to md5-cache, for QA&tooling purposes

Zac Medico-2
On 7/25/19 11:49 PM, Fabian Groffen wrote:

> Hi,
>
> On 25-07-2019 14:20:50 +0200, Michał Górny wrote:
>> Hi,
>>
>> TL;DR: I'd like to make it possible for ebuilds to define additional
>> variables that will be stored in md5-cache.  This would be useful for CI
>> and other tooling that right now has to parse ebuilds for other data.
>
> Only downside I can think of, is a diskspace increase for the md5-cache.
> Not sure if this is going to be substantial, but given things like
> PYTHON_COMPAT, perhaps a quick calculation of extra "cost" can be made.
> Should diskspace become a problem, one could consider to use a separate
> file/dir, that users could rsync-exclude, since Portage won't need it to
> operate properly.
Yes, using a separate directory from md5-cache will provide useful
isolation. There's a lot of potential for bloat here, and by keeping it
separate we can easily render the bloat harmless.

> Thanks,
> Fabian--
Thanks,
Zac


signature.asc (1000 bytes) Download Attachment