How to deal with git sources?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

How to deal with git sources?

Mike Auty-3
Hiya,

I'm trying to update/package up mypy for the main tree which, whilst it
provides a release tarball, relies upon a data library (typeshed) which
does not provide releases.  The recommended method of installation by
upstream is to use git submodules (which the ebuild will do happily),
but repoman rightly complains about LIVEVCS issues.

Is the current recommended method for dealing with this to manually
create a tarball and stash it on dev.gentoo.org somewhere accessible or
are there updated guidelines for this kind of scenario?  If so, where
would they be documented?  Searching for LIVECVS found a bunch of
repoman change discussions, but no documentation as to how to deal with
ebuilds that require this...

Mike  5:)


signature.asc (378 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Mike Auty-3
So,

tldr; Github will tarball up specific commits (or master) for you to add
to SRC_URI.

I ended up using the github API to pull down a tarball of the git repo,
rather than using git to pull it down.  I suppose that offers the
ability to Manifest it and check for changes, but I now have to encode
the fixed commit into every version of the ebuild because the only
location it's recorded is in the submodule commit hash of the package's
repo.

I could have it live in the PV, but I think that'd lead to potential
version sorting errors if people try and use copies of the ebuild.
Making typeshed its own package doesn't help because it's installed
under /usr/share/mypy/typeshed, not /usr/share/typeshed so it really is
part of mypy (for now).  So I'll see how this goes and listen for
feedback...

Mike  5:)

On 11/03/18 18:05, Mike Auty wrote:

> Hiya,
>
> I'm trying to update/package up mypy for the main tree which, whilst it
> provides a release tarball, relies upon a data library (typeshed) which
> does not provide releases.  The recommended method of installation by
> upstream is to use git submodules (which the ebuild will do happily),
> but repoman rightly complains about LIVEVCS issues.
>
> Is the current recommended method for dealing with this to manually
> create a tarball and stash it on dev.gentoo.org somewhere accessible or
> are there updated guidelines for this kind of scenario?  If so, where
> would they be documented?  Searching for LIVECVS found a bunch of
> repoman change discussions, but no documentation as to how to deal with
> ebuilds that require this...
>
> Mike  5:)
>


signature.asc (378 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Duncan-42
Mike Auty posted on Sun, 11 Mar 2018 19:19:00 +0000 as excerpted:

> tldr; Github will tarball up specific commits (or master) for you to add
> to SRC_URI.
>
> I ended up using the github API to pull down a tarball of the git repo,
> rather than using git to pull it down.  I suppose that offers the
> ability to Manifest it and check for changes, but I now have to encode
> the fixed commit into every version of the ebuild because the only
> location it's recorded is in the submodule commit hash of the package's
> repo.

Please check...

If I'm recalling correctly a warning posted on this list, repeated calls
to the github tarballing API for the same commit will result in delivery
of tarballs with differing checksums.  How/why wasn't explained as I
recall, possibly part of the reason I'm not sure I'm recalling things
correctly as that would have internally flagged it as unreliable/needing-
verification, but that was the warning as I remember it.

If it's correct, you can pull the tarball from github to store on devspace
and link it as the checksummed tarball, as that's static and won't
change, but you can't link the github tarballing API directly, as that
/will/ change and thus will fail sources checksum verification at least
some of the time.

But (assuming avoiding linking devspace is worth the trouble in the first
place if possible) either verify it yourself or wait for verification/
negation from others, as I'm not entirely sure I'm recalling that warning
post correctly.  It might have been for other than github, or I might
have misunderstood, or maybe they've fixed that problem by now, or...

--
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Martin Vaeth-2
Duncan <[hidden email]> wrote:
>
> If I'm recalling correctly a warning posted on this list, repeated calls
> to the github tarballing API for the same commit will result in delivery
> of tarballs with differing checksums.

This was so many many years ago in the beginning of github.
This has long been fixed since then.
Actually, github is just using "git archive" which produces
always the same tarball.
Only possible exception is if you use .zip instead of .tar.gz
because the former (due to FAT compatibility in the format)
has to store the timezone. So if the timezone of the github
server ever changes, the .zip "tar"ball might also change.


Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Michael Orlitzky
On 03/12/2018 04:29 AM, Martin Vaeth wrote:
> Duncan <[hidden email]> wrote:
>>
>> If I'm recalling correctly a warning posted on this list, repeated calls
>> to the github tarballing API for the same commit will result in delivery
>> of tarballs with differing checksums.
>
> This was so many many years ago in the beginning of github.
> This has long been fixed since then.

Every once in a while they still change. This is from a few weeks ago:

https://marc.info/?l=openbsd-ports&m=151973450514279&w=2

Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Martin Vaeth-2
Michael Orlitzky <[hidden email]> wrote:
> On 03/12/2018 04:29 AM, Martin Vaeth wrote:
>> This was so many many years ago in the beginning of github.
>> This has long been fixed since then.
>
> Every once in a while they still change. This is from a few weeks ago:
>
> https://marc.info/?l=openbsd-ports&m=151973450514279&w=2

As mentioned, github uses "git archive" to generate the tarballs.
So - theoretically - if a new version of git should have "git archive"
implemented differently (which might happen indirectly by new versions
of tar/zlib), there  might indeed be change.

However, this is purely theoretical: I just upgrade to most current
git-2.16.2 and tar-1.30, and checked: The generated tarballs still match
with that from 2014. So I really do not know which change
"a few months ago" the above message refers to: I could not detect any
change (and I really checked a lot of packages now).

Perhaps they refer to .zip instead of .tar.gz which as mentioned is
a less stable format due to the inclusion of the timezone.


Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Vadim A. Misbakh-Soloviov-2
> Perhaps they refer to .zip instead of .tar.gz which as mentioned is
> a less stable format due to the inclusion of the timezone.

Nope. I myself also faced tarballs checksum difference (even between few
calls).

GH support answered me (in TL;DR version) "that's because we've upgraded git
on *some* of our nodes" (means, some other using older git), and "we've never
guaranteed same checksums".



Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Martin Vaeth-2
Vadim A. Misbakh-Soloviov <[hidden email]> wrote:
>
> GH support answered me (in TL;DR version) "that's because we've upgraded git
> on *some* of our nodes" (means, some other using older git)

That would still require that the "git archive" output would have
changed in some recent git versions. And at least between the most
current 2.16.2 and comparing with all my git tarballs (some as
mentioned rather old), I could not produce any difference.
So I still do not understand what should be going on.


Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

NP-Hardass-2
On 03/15/2018 07:20 PM, Martin Vaeth wrote:

> Vadim A. Misbakh-Soloviov <[hidden email]> wrote:
>>
>> GH support answered me (in TL;DR version) "that's because we've upgraded git
>> on *some* of our nodes" (means, some other using older git)
>
> That would still require that the "git archive" output would have
> changed in some recent git versions. And at least between the most
> current 2.16.2 and comparing with all my git tarballs (some as
> mentioned rather old), I could not produce any difference.
> So I still do not understand what should be going on.
>
>
IIRC, it was attributed to
https://github.com/git/git/commit/22f0dcd9634a818a0c83f23ea1a48f2d620c0546

--
NP-Hardass


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Ulrich Mueller-2
In reply to this post by Martin Vaeth-2
>>>>> On Thu, 15 Mar 2018, Martin Vaeth wrote:

> Vadim A. Misbakh-Soloviov <[hidden email]> wrote:
>>
>> GH support answered me (in TL;DR version) "that's because we've
>> upgraded git on *some* of our nodes" (means, some other using older
>> git)

> That would still require that the "git archive" output would have
> changed in some recent git versions. And at least between the most
> current 2.16.2 and comparing with all my git tarballs (some as
> mentioned rather old), I could not produce any difference.
> So I still do not understand what should be going on.

I think the conclusion is that github generates tarballs on the fly,
and therefore we cannot rely on them being invariant over a long time.
They may be susceptible to changes in git, in tar, or in gzip.

Ulrich

attachment0 (501 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Ulrich Mueller-2
>>>>> On Fri, 16 Mar 2018, Ulrich Mueller wrote:

> I think the conclusion is that github generates tarballs on the fly,
> and therefore we cannot rely on them being invariant over a long time.
> They may be susceptible to changes in git, in tar, or in gzip.

In fact, only git and gzip (because the tar archive is built by git
itself). This doesn't change the argument, though.

Ulrich

attachment0 (501 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Martin Vaeth-2
In reply to this post by NP-Hardass-2
NP-Hardass <[hidden email]> wrote:
>
> IIRC, it was attributed to
> https://github.com/git/git/commit/22f0dcd9634a818a0c83f23ea1a48f2d620c0546

Thanks. That explains why I was not able to produce a difference:
It involves only the rather exotic case that a path in a git repository
is longer than 100 characters.


Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Martin Vaeth-2
In reply to this post by Ulrich Mueller-2
Ulrich Mueller <[hidden email]> wrote:
>
> I think the conclusion is that github generates tarballs on the fly,
> and therefore we cannot rely on them being invariant over a long time.

Yes, but with emphasis on _long_ time and theory.
In practice this was happening now exactly _once_ in a decade
(according to all we learnt so far) for the understandable
reason of fixing an annoying incompatibility in an exotic case.
And the existence of zopfli shows that other backward-compatible
improvements _would_ have been possible, but apparently non-changing
of the produced tarball was always rated higher than anything else
(up to this exception).

So I would not worry too much about it: It is not worth the cost of
hosting a huge number of tarballs permanently (or to convince
upstream to let them be hosted by github for every single version,
only because one cannot theoretically exclude that a similar thing
won't ever happen again). Yes, for the transition period (until all
github servers use a new enough version) a solution for the few involved
tarballs has to be found (like temporarily hosting on devspace).
But after this period it is only a question of updating the
checksum once for the involved packages.


Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

James Le Cuirot
On Fri, 16 Mar 2018 10:03:44 +0000 (UTC)
Martin Vaeth <[hidden email]> wrote:

> So I would not worry too much about it: It is not worth the cost of
> hosting a huge number of tarballs permanently (or to convince
> upstream to let them be hosted by github for every single version,
> only because one cannot theoretically exclude that a similar thing
> won't ever happen again). Yes, for the transition period (until all
> github servers use a new enough version) a solution for the few
> involved tarballs has to be found (like temporarily hosting on
> devspace). But after this period it is only a question of updating the
> checksum once for the involved packages.

Agreed. I use this GitHub feature quite a lot and I've only ever seen
this happen maybe once? Even then, I think it might have been one of
the additional downloads rather than the git archives, which upstream
had probably replaced without bumping.

--
James Le Cuirot (chewi)
Gentoo Linux Developer

Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Ulrich Mueller-2
In reply to this post by Martin Vaeth-2
>>>>> On Fri, 16 Mar 2018, Martin Vaeth wrote:

> Ulrich Mueller <[hidden email]> wrote:
>>
>> I think the conclusion is that github generates tarballs on the
>> fly, and therefore we cannot rely on them being invariant over a
>> long time.

> So I would not worry too much about it: It is not worth the cost of
> hosting a huge number of tarballs permanently

I agree, because hosting tarballs of upstream packages is not a task
for us as a distro.

> (or to convince upstream to let them be hosted by github for every
> single version, only because one cannot theoretically exclude that a
> similar thing won't ever happen again). [...]

In the first place, upstream should make proper releases, which
includes creating a pristine tarball and permanently hosting it.
So, yell at them if they don't. And no, a git tag is not a release.

Ulrich

attachment0 (501 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Michał Górny-5
W dniu pią, 16.03.2018 o godzinie 12∶00 +0100, użytkownik Ulrich Mueller
napisał:

> > > > > > On Fri, 16 Mar 2018, Martin Vaeth wrote:
> > Ulrich Mueller <[hidden email]> wrote:
> > >
> > > I think the conclusion is that github generates tarballs on the
> > > fly, and therefore we cannot rely on them being invariant over a
> > > long time.
> > So I would not worry too much about it: It is not worth the cost of
> > hosting a huge number of tarballs permanently
>
> I agree, because hosting tarballs of upstream packages is not a task
> for us as a distro.
>
> > (or to convince upstream to let them be hosted by github for every
> > single version, only because one cannot theoretically exclude that a
> > similar thing won't ever happen again). [...]
>
> In the first place, upstream should make proper releases, which
> includes creating a pristine tarball and permanently hosting it.
> So, yell at them if they don't. And no, a git tag is not a release.
>

Feel free to convince Python upstreams to include tests in their
releases. Last I tried, I heard that tests are not useful for people who
install packages, and that they would make tarballs bigger.

--
Best regards,
Michał Górny


Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

William Hubbs
In reply to this post by Ulrich Mueller-2
On Fri, Mar 16, 2018 at 12:00:47PM +0100, Ulrich Mueller wrote:

> >>>>> On Fri, 16 Mar 2018, Martin Vaeth wrote:
>
> > Ulrich Mueller <[hidden email]> wrote:
> >>
> >> I think the conclusion is that github generates tarballs on the
> >> fly, and therefore we cannot rely on them being invariant over a
> >> long time.
>
> > So I would not worry too much about it: It is not worth the cost of
> > hosting a huge number of tarballs permanently
>
> I agree, because hosting tarballs of upstream packages is not a task
> for us as a distro.
>
> > (or to convince upstream to let them be hosted by github for every
> > single version, only because one cannot theoretically exclude that a
> > similar thing won't ever happen again). [...]
>
> In the first place, upstream should make proper releases, which
> includes creating a pristine tarball and permanently hosting it.
> So, yell at them if they don't. And no, a git tag is not a release.
>
> Ulrich
Yelling at an upstream might get you somewhere, but you can't force
them, and in this case, they might tell you to create the pristine
tarball yourself using "git archive" if you want one.

I am in the same camp as Martin and James. I would rather see the issues
fixed for the specific packages involved than us try to host tarballs
for every package that doesn't create them.

William


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to deal with git sources?

Rich Freeman
On Fri, Mar 16, 2018 at 1:21 PM, William Hubbs <[hidden email]> wrote:
>
> I am in the same camp as Martin and James. I would rather see the issues
> fixed for the specific packages involved than us try to host tarballs
> for every package that doesn't create them.
>

++

If github didn't already provide a solution that works 95% of the time
I'd consider it more of a need.

And while a lot of people have issues with github, IMO this one part
of github is largely FOSS (they're just using git archive here).
Simply serving up git and git archive tarballs is something that could
be easily moved to another hosting provider if somebody stepped up to
offer the service.  If somebody were offering to build a homegrown
solution that would probably be fine, but I'm not really seeing
that...

--
Rich