RFC GLEP 1005: Package Tags

classic Classic list List threaded Threaded
58 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Michał Górny-5
Dnia 2014-03-23, o godz. 16:27:43
Joshua Kinard <[hidden email]> napisał(a):

> On 03/23/2014 15:44, Michał Górny wrote:
> > Tags, on the other hand, are more 'live'. They place the package
> > somewhere in the 'global' tag hierarchy that can change over time.
> > I expect that people other than maintainers will be adding tags to
> > packages (and changing them), and that people will invent new tags
> > and apply them to more packages.
> >
> > So, first of all, your solution would mean that every commit adding
> > a new tag or changing one of the tags would modify the package
> > metadata.xml. This means a Manifest update and a ChangeLog entry (please
> > don't get into more rules for ChangeLogs now), and this means it will be
> > harder to find actually useful entries there.
> >
> > So we make tag updates harder, and increase time and size of rsync.
>
> Instead of individual <tag> lines in metadata.xml for each tag, why not a
> single <tags> line that contains a comma-delimited list of up to five tags,
> whitespace optional?  That should help reduce the "fluff" of the tree by
> adding this feature.
>
> E.g.,
>
> <tags>one,two,three,four,five</tags>
Either use XML, or don't use XML. Don't make this some kind of ugly
mixture of XML with non-XML.

So:

  <tags>
    <tag>one</tag>
    <tag>two</tag>
  </tags>

if we're really going for this. But I guess our DTD doesn't allow easy
definition of single <tags/> with no forced position.

> > Secondly, since tags for every package will be held in different files,
> > people will need dedicated tools to collect tags from all those files
> > and add matching tags to their own packages. Long story short, we're
> > going to have many 'duplicate' tags that will require even more commits
> > with ChangeLog entries and Manifest updates.
>
> If we automate the generation of a master tag index file, like
> use.desc.local, this can be avoided.  emerge can simply go rummage through
> the master index for matching tag entries instead of going through the
> entire tree.  Because if we wanted to sift through the entire tree, grep
> would be a far better method (compiled C and probably better text-matching
> algorithms than emerge).
And this goes pretty much backwards to what we were aiming at. We
should finally kill use.desc.local, not get inspired by the redundancy.

> > Worse than that, your GLEP doesn't even have any basic rules for naming
> > tags -- like what language form to use and, say, which character to use
> > instead of space. This sounds like the sort of things that's going to
> > make it even harder to get some consistency, especially if some
> > developers are going to follow someone else committing earlier and some
> > will follow their own rules.
>
> Easy: ASCII, alphanumeric only, must start with a letter, lowercase, no
> spaces.  A lot of problems are avoided if we keep tags to one-word
> descriptors only.  E.g., for mail clients, they would carry both 'mail' and
> 'client' as two of their five tags.  For kmail, a third tag would be 'kde'
> and Evolution would have 'gnome' instead.
I'm pretty sure you will finally hit something that goes with two
words. Protocol name or something.

> I'd also suggest that 'all' be considered a default, global tag for all
> packages, it be a reserved tag internal to emerge and other package
> managers, and not count against the number of allowed tags (meaning that
> technically, a package is allow five tags + 'all').
>
> As for default tags when a package does not define any, the package category
> gets split at the hyphen and becomes two independent tags.  This is
> overridden when at least one tag is defined in metadata.xml.

Will this have a real benefit? Sounds like unnecessary confusion for
a minor gain to me.

--
Best regards,
Michał Górny

signature.asc (985 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Joshua Kinard-2
On 03/23/2014 17:05, Michał Górny wrote:

> Dnia 2014-03-23, o godz. 16:27:43
> Joshua Kinard <[hidden email]> napisał(a):
>
>> On 03/23/2014 15:44, Michał Górny wrote:
>>> Tags, on the other hand, are more 'live'. They place the package
>>> somewhere in the 'global' tag hierarchy that can change over time.
>>> I expect that people other than maintainers will be adding tags to
>>> packages (and changing them), and that people will invent new tags
>>> and apply them to more packages.
>>>
>>> So, first of all, your solution would mean that every commit adding
>>> a new tag or changing one of the tags would modify the package
>>> metadata.xml. This means a Manifest update and a ChangeLog entry (please
>>> don't get into more rules for ChangeLogs now), and this means it will be
>>> harder to find actually useful entries there.
>>>
>>> So we make tag updates harder, and increase time and size of rsync.
>>
>> Instead of individual <tag> lines in metadata.xml for each tag, why not a
>> single <tags> line that contains a comma-delimited list of up to five tags,
>> whitespace optional?  That should help reduce the "fluff" of the tree by
>> adding this feature.
>>
>> E.g.,
>>
>> <tags>one,two,three,four,five</tags>
>
> Either use XML, or don't use XML. Don't make this some kind of ugly
> mixture of XML with non-XML.
>
> So:
>
>   <tags>
>     <tag>one</tag>
>     <tag>two</tag>
>   </tags>
>
> if we're really going for this. But I guess our DTD doesn't allow easy
> definition of single <tags/> with no forced position.

TBH, I don't like the use of XML at all.  Never have and never will.  I am a
big fan of INI-style definitions (i.e., like Samba's config).  XML just
leads to a lot of unneeded fluff in what should be a really small file,
which is why I was proposing a single <tags> element instead of multiple
<tag> elements.

E.g., instead for local USE of this:

<use>
<flag name='foo'>FOO</flag>
<flag name='bar'>BAR</flag>
<flag name='baz'>BAZ</flag>
</use>

(96 bytes)

This would be better:

[local use]
foo = "FOO"
bar = "BAR"
baz = "BAZ"

(47 bytes)

Not a complicated example, but would be >50% reduction in size.  But, I
digress...


>>> Secondly, since tags for every package will be held in different files,
>>> people will need dedicated tools to collect tags from all those files
>>> and add matching tags to their own packages. Long story short, we're
>>> going to have many 'duplicate' tags that will require even more commits
>>> with ChangeLog entries and Manifest updates.
>>
>> If we automate the generation of a master tag index file, like
>> use.desc.local, this can be avoided.  emerge can simply go rummage through
>> the master index for matching tag entries instead of going through the
>> entire tree.  Because if we wanted to sift through the entire tree, grep
>> would be a far better method (compiled C and probably better text-matching
>> algorithms than emerge).
>
> And this goes pretty much backwards to what we were aiming at. We
> should finally kill use.desc.local, not get inspired by the redundancy.

And what replaces it?  What differentiates a global USE flag that has
purpose across multiple packages (like 'ipv6') against a flag that only
exists for a single package?

I'll agree that USE flags have definitely gotten out of control, and the
trend now seems to be moving sharply away from defining a global USE
definition in make.conf instead to per-package USE flags in
/etc/portage/package.use.  Which, while offering more granular control, can
be mind-numbingly annoying at times.

The automated generation of use.local.desc definitely made maintenance of
some things easier.  We've gotta index USE flags some how, and separating
them into global and local categories still makes sense to me.  But, I'm
probably just going senile...


>>> Worse than that, your GLEP doesn't even have any basic rules for naming
>>> tags -- like what language form to use and, say, which character to use
>>> instead of space. This sounds like the sort of things that's going to
>>> make it even harder to get some consistency, especially if some
>>> developers are going to follow someone else committing earlier and some
>>> will follow their own rules.
>>
>> Easy: ASCII, alphanumeric only, must start with a letter, lowercase, no
>> spaces.  A lot of problems are avoided if we keep tags to one-word
>> descriptors only.  E.g., for mail clients, they would carry both 'mail' and
>> 'client' as two of their five tags.  For kmail, a third tag would be 'kde'
>> and Evolution would have 'gnome' instead.
>
> I'm pretty sure you will finally hit something that goes with two
> words. Protocol name or something.

Perhaps, but we can fight that battle when we get there.  starting off with
one-word tags keeps things simple for now and that'll make it easier to
determine whether this experiment actually pans out or not.


>> I'd also suggest that 'all' be considered a default, global tag for all
>> packages, it be a reserved tag internal to emerge and other package
>> managers, and not count against the number of allowed tags (meaning that
>> technically, a package is allow five tags + 'all').
>>
>> As for default tags when a package does not define any, the package category
>> gets split at the hyphen and becomes two independent tags.  This is
>> overridden when at least one tag is defined in metadata.xml.
>
> Will this have a real benefit? Sounds like unnecessary confusion for
> a minor gain to me.

Which?  The internal 'all' tag or the use of existing category names as a
default set of tags for packages that don't have any tags defined?

The 'all' thing is probably unnecessary, as the same effect can be done with
wildcarding or some other programming trick.  The latter is just a way to
avoid having to handle the lack of tags.  Because if this is implemented,
it's going to take years for most of the packages in the tree to get tags
assigned to them.  By having a default set of tags to link most packages to,
it makes finding them via a tag search easy.  E.g., even if a particular
package in dev-python lacks tags, you can still find it by searching for the
tag "python".

Granted, a tag of "dev" offers no value (dev-python -> 'dev','python'), but
if you were looking for a web browser versus a web server, having default
tags of 'www','client' or 'www','servers' helps for packages in www-client
and www-servers.


Tags aside, wasn't there a proposal long ago to re-categorize the entire
tree because someone felt that the double-atom naming mechanism for
categories (atom1-atom2) wasn't flexible nor descriptive enough?  The entire
Portage tree idea derives from Ports, and it's really ballooned over the
years, while a modern-day Ports tree in /usr/ports is still pretty small and
self-contained.  I've always wondered is we allowed portage to have one
additional level of nesting if that'd help any (i.e., games-* -> games/*).
It really seems like this is what tags is attempting to solve, so maybe that
problem needs to be revisited instead.


--
Joshua Kinard
Gentoo/MIPS
[hidden email]
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Alan McKinnon-2
In reply to this post by hasufell
On 23/03/2014 22:08, hasufell wrote:

> Michał Górny:
>> Dnia 2014-03-22, o godz. 15:33:27 Alec Warner <[hidden email]>
>> napisał(a):
>
>>> https://wiki.gentoo.org/wiki/Package_Tags
>>>
>>> Object or forever hold your peace.
>
>
>> I'd honestly prefer that -- if we should really keep tags in the
>> tree -- to do that with a single 'metadata/tags' file or some kind
>> of hierarchy there. Keep them outside the package directory --
>> bind packages to tags, rather than tags to packages. Keep all the
>> commits in a single place without altering the ebuild work flow.
>
>
> That sounds better. That way it is also easier to get some
> consistency. E.g. tags can be discussed... but adding packages to tags
> is up to the maintainers.
>
> The GLEP should maybe cover a basic set of tags. Then projects like
> games, science etc could add their sets as well which may be a bit
> more specific... instead of random maintainers adding random tags.


Regular user/sysadmin chipping in:

This topic seems a lot like a solution seeking a problem to solve, or
alternatively a dev is looking for an easy way to describe stuff. Not
that there's anything wrong with that, but the proposal as written is
way too vague to be useful.

Tags work best when they describe narrow, clearly defined attributes,
and the thing they are applied to can have one, two or more of these
attributes or sometimes even none. Music and movie genres are an
excellent example - there are only so many of them and for the most part
one can tell whether a tag really is a genre or not.

Nothing resembling such limits are proposed in this GLEP, there's not
even a recommendation of what the tags will describe or how everything
will be tagged equally. What happens if someone zealously over-tags all
of gnome and the same thing doesn't happen for kde? Does kde just not
show up in tag searches anymore?

So this just seems like a nice-to-have that hasn't been properly thought
through. The main stated use of it is for packages that logically belong
to more than one category. So instead of a general catch all, do
whatever you want mechanism, let's rather solve that exact problem by
for example adding a specific field to metadata eg "supplementary
categories". Pick those that apply from a clearly defined list and store
the data in a clearly defined place.

Such a thing can be made more generic, by making it a clear mechanism to
describe extra metadata and the things to be described go through a
defined process first before making it into the list. this concept is
not present in the GLEP as currently written.

--
Alan McKinnon
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Michał Górny-5
In reply to this post by Joshua Kinard-2
Dnia 2014-03-23, o godz. 17:40:20
Joshua Kinard <[hidden email]> napisał(a):

> On 03/23/2014 17:05, Michał Górny wrote:
> > Dnia 2014-03-23, o godz. 16:27:43
> > Joshua Kinard <[hidden email]> napisał(a):
> >
> >> On 03/23/2014 15:44, Michał Górny wrote:
> >>> Tags, on the other hand, are more 'live'. They place the package
> >>> somewhere in the 'global' tag hierarchy that can change over time.
> >>> I expect that people other than maintainers will be adding tags to
> >>> packages (and changing them), and that people will invent new tags
> >>> and apply them to more packages.
> >>>
> >>> So, first of all, your solution would mean that every commit adding
> >>> a new tag or changing one of the tags would modify the package
> >>> metadata.xml. This means a Manifest update and a ChangeLog entry (please
> >>> don't get into more rules for ChangeLogs now), and this means it will be
> >>> harder to find actually useful entries there.
> >>>
> >>> So we make tag updates harder, and increase time and size of rsync.
> >>
> >> Instead of individual <tag> lines in metadata.xml for each tag, why not a
> >> single <tags> line that contains a comma-delimited list of up to five tags,
> >> whitespace optional?  That should help reduce the "fluff" of the tree by
> >> adding this feature.
> >>
> >> E.g.,
> >>
> >> <tags>one,two,three,four,five</tags>
> >
> > Either use XML, or don't use XML. Don't make this some kind of ugly
> > mixture of XML with non-XML.
> >
> > So:
> >
> >   <tags>
> >     <tag>one</tag>
> >     <tag>two</tag>
> >   </tags>
> >
> > if we're really going for this. But I guess our DTD doesn't allow easy
> > definition of single <tags/> with no forced position.
>
> TBH, I don't like the use of XML at all.  Never have and never will.  I am a
> big fan of INI-style definitions (i.e., like Samba's config).  XML just
> leads to a lot of unneeded fluff in what should be a really small file,
> which is why I was proposing a single <tags> element instead of multiple
> <tag> elements.
metadata.xml is XML at the moment, so you are supposed to obey its
rules, whether you like them or not. if you want to replace it with
something else, feel free to try. But don't make a shitsoup mixin out
of it.

> >>> Secondly, since tags for every package will be held in different files,
> >>> people will need dedicated tools to collect tags from all those files
> >>> and add matching tags to their own packages. Long story short, we're
> >>> going to have many 'duplicate' tags that will require even more commits
> >>> with ChangeLog entries and Manifest updates.
> >>
> >> If we automate the generation of a master tag index file, like
> >> use.desc.local, this can be avoided.  emerge can simply go rummage through
> >> the master index for matching tag entries instead of going through the
> >> entire tree.  Because if we wanted to sift through the entire tree, grep
> >> would be a far better method (compiled C and probably better text-matching
> >> algorithms than emerge).
> >
> > And this goes pretty much backwards to what we were aiming at. We
> > should finally kill use.desc.local, not get inspired by the redundancy.
>
> And what replaces it?  What differentiates a global USE flag that has
> purpose across multiple packages (like 'ipv6') against a flag that only
> exists for a single package?
Applications are supposed to read metadata.xml for local flags. That's
all about it. Having an extra index file doesn't really make sense
there.

> >>> Worse than that, your GLEP doesn't even have any basic rules for naming
> >>> tags -- like what language form to use and, say, which character to use
> >>> instead of space. This sounds like the sort of things that's going to
> >>> make it even harder to get some consistency, especially if some
> >>> developers are going to follow someone else committing earlier and some
> >>> will follow their own rules.
> >>
> >> Easy: ASCII, alphanumeric only, must start with a letter, lowercase, no
> >> spaces.  A lot of problems are avoided if we keep tags to one-word
> >> descriptors only.  E.g., for mail clients, they would carry both 'mail' and
> >> 'client' as two of their five tags.  For kmail, a third tag would be 'kde'
> >> and Evolution would have 'gnome' instead.
> >
> > I'm pretty sure you will finally hit something that goes with two
> > words. Protocol name or something.
>
> Perhaps, but we can fight that battle when we get there.  starting off with
> one-word tags keeps things simple for now and that'll make it easier to
> determine whether this experiment actually pans out or not.
If you introduce arbitrary limitations, people will either find a way
around them (which means getting even worse mess) or omit some tags.
Either way, tags become less helpful.

> >> I'd also suggest that 'all' be considered a default, global tag for all
> >> packages, it be a reserved tag internal to emerge and other package
> >> managers, and not count against the number of allowed tags (meaning that
> >> technically, a package is allow five tags + 'all').
> >>
> >> As for default tags when a package does not define any, the package category
> >> gets split at the hyphen and becomes two independent tags.  This is
> >> overridden when at least one tag is defined in metadata.xml.
> >
> > Will this have a real benefit? Sounds like unnecessary confusion for
> > a minor gain to me.
>
> Which?  The internal 'all' tag or the use of existing category names as a
> default set of tags for packages that don't have any tags defined?
The 'all' tag sounds like something that would have no value.

The automagic tags sound like a way to confuse people -- yesterday it
had this tag, now I wanted to add a new one and the old tag
disappeared! Not to mention sometimes the categories don't give really
useful tags. Tags are not replacing categories, so no point in trying
to bind the two together.

--
Best regards,
Michał Górny

signature.asc (985 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Joshua Kinard-2
On 03/23/2014 17:51, Michał Górny wrote:

> Dnia 2014-03-23, o godz. 17:40:20
> Joshua Kinard <[hidden email]> napisał(a):
>
>> On 03/23/2014 17:05, Michał Górny wrote:
>>> Dnia 2014-03-23, o godz. 16:27:43
>>> Joshua Kinard <[hidden email]> napisał(a):
>>>
>>>> On 03/23/2014 15:44, Michał Górny wrote:
>>>>> Tags, on the other hand, are more 'live'. They place the package
>>>>> somewhere in the 'global' tag hierarchy that can change over time.
>>>>> I expect that people other than maintainers will be adding tags to
>>>>> packages (and changing them), and that people will invent new tags
>>>>> and apply them to more packages.
>>>>>
>>>>> So, first of all, your solution would mean that every commit adding
>>>>> a new tag or changing one of the tags would modify the package
>>>>> metadata.xml. This means a Manifest update and a ChangeLog entry (please
>>>>> don't get into more rules for ChangeLogs now), and this means it will be
>>>>> harder to find actually useful entries there.
>>>>>
>>>>> So we make tag updates harder, and increase time and size of rsync.
>>>>
>>>> Instead of individual <tag> lines in metadata.xml for each tag, why not a
>>>> single <tags> line that contains a comma-delimited list of up to five tags,
>>>> whitespace optional?  That should help reduce the "fluff" of the tree by
>>>> adding this feature.
>>>>
>>>> E.g.,
>>>>
>>>> <tags>one,two,three,four,five</tags>
>>>
>>> Either use XML, or don't use XML. Don't make this some kind of ugly
>>> mixture of XML with non-XML.
>>>
>>> So:
>>>
>>>   <tags>
>>>     <tag>one</tag>
>>>     <tag>two</tag>
>>>   </tags>
>>>
>>> if we're really going for this. But I guess our DTD doesn't allow easy
>>> definition of single <tags/> with no forced position.
>>
>> TBH, I don't like the use of XML at all.  Never have and never will.  I am a
>> big fan of INI-style definitions (i.e., like Samba's config).  XML just
>> leads to a lot of unneeded fluff in what should be a really small file,
>> which is why I was proposing a single <tags> element instead of multiple
>> <tag> elements.
>
> metadata.xml is XML at the moment, so you are supposed to obey its
> rules, whether you like them or not. if you want to replace it with
> something else, feel free to try. But don't make a shitsoup mixin out
> of it.

I'm not proposing to change it now...bit too late for that.  But if I ever
come across a TARDIS on eBay, well...

That said, Is XML that specific that every single atom has to be wrapped by
an individual tag?  A comma-separated list of values in its own XML tag is
prohibited by the spec?  I don't use XML often (if at all), so I am not
familiar with its intrinsics.


>>>>> Secondly, since tags for every package will be held in different files,
>>>>> people will need dedicated tools to collect tags from all those files
>>>>> and add matching tags to their own packages. Long story short, we're
>>>>> going to have many 'duplicate' tags that will require even more commits
>>>>> with ChangeLog entries and Manifest updates.
>>>>
>>>> If we automate the generation of a master tag index file, like
>>>> use.desc.local, this can be avoided.  emerge can simply go rummage through
>>>> the master index for matching tag entries instead of going through the
>>>> entire tree.  Because if we wanted to sift through the entire tree, grep
>>>> would be a far better method (compiled C and probably better text-matching
>>>> algorithms than emerge).
>>>
>>> And this goes pretty much backwards to what we were aiming at. We
>>> should finally kill use.desc.local, not get inspired by the redundancy.
>>
>> And what replaces it?  What differentiates a global USE flag that has
>> purpose across multiple packages (like 'ipv6') against a flag that only
>> exists for a single package?
>
> Applications are supposed to read metadata.xml for local flags. That's
> all about it. Having an extra index file doesn't really make sense
> there.

But they don't currently, do they?  As far as I know, most everything parses
the use.local.desc file.  Wouldn't having portage apps read/parse every
package's metadata.xml file introduce a lot of disk I/O to seek out those
files across the entire tree?  That would seem like a bigger step backwards
if so.


>>>>> Worse than that, your GLEP doesn't even have any basic rules for naming
>>>>> tags -- like what language form to use and, say, which character to use
>>>>> instead of space. This sounds like the sort of things that's going to
>>>>> make it even harder to get some consistency, especially if some
>>>>> developers are going to follow someone else committing earlier and some
>>>>> will follow their own rules.
>>>>
>>>> Easy: ASCII, alphanumeric only, must start with a letter, lowercase, no
>>>> spaces.  A lot of problems are avoided if we keep tags to one-word
>>>> descriptors only.  E.g., for mail clients, they would carry both 'mail' and
>>>> 'client' as two of their five tags.  For kmail, a third tag would be 'kde'
>>>> and Evolution would have 'gnome' instead.
>>>
>>> I'm pretty sure you will finally hit something that goes with two
>>> words. Protocol name or something.
>>
>> Perhaps, but we can fight that battle when we get there.  starting off with
>> one-word tags keeps things simple for now and that'll make it easier to
>> determine whether this experiment actually pans out or not.
>
> If you introduce arbitrary limitations, people will either find a way
> around them (which means getting even worse mess) or omit some tags.
> Either way, tags become less helpful.

Everything trends towards greater entropy, whether we like it or not.
Portage started with the basic idea of Ports, but it's grown way beyond that
over the years.  USE flags were supposed to be simple switches for
controlling compile-time functionality, emerge used to be the only package
manager, and Gentoo used to only support the Linux kernel and sysvinit scripts.

Whatever implementation of tags is adopted, if any, will eventually grow
beyond its original design parameters.  If tags are not adopted, something
else will probably get proposed and adopted down the road that will outgrow
its design parameters.  The question is, are tags the best we can do *now*,
or do we wait for some better idea to appear down the road and then go with
that instead?


>>>> I'd also suggest that 'all' be considered a default, global tag for all
>>>> packages, it be a reserved tag internal to emerge and other package
>>>> managers, and not count against the number of allowed tags (meaning that
>>>> technically, a package is allow five tags + 'all').
>>>>
>>>> As for default tags when a package does not define any, the package category
>>>> gets split at the hyphen and becomes two independent tags.  This is
>>>> overridden when at least one tag is defined in metadata.xml.
>>>
>>> Will this have a real benefit? Sounds like unnecessary confusion for
>>> a minor gain to me.
>>
>> Which?  The internal 'all' tag or the use of existing category names as a
>> default set of tags for packages that don't have any tags defined?
>
> The 'all' tag sounds like something that would have no value.

Okay, let's ignore that then.  I'm just brainstorming -- not every idea has
worth or merit.


> The automagic tags sound like a way to confuse people -- yesterday it
> had this tag, now I wanted to add a new one and the old tag
> disappeared! Not to mention sometimes the categories don't give really
> useful tags. Tags are not replacing categories, so no point in trying
> to bind the two together.

I am not suggesting that tags replace categories.  Categories were the
original way to group packages (again, deriving from how Ports does it), so
when no tags are defined for a package, they offer a somewhat-suitable
fill-in.  That's not binding the two in any direct way, it's just offering a
default/fallback set of tags until a package maintainer updates metadata.xml
to add actual tag definitions.

Sample python pseudocode:

if not package.tags:
    package.tags = package.category.split('-')

If you have a better idea, I am definitely all ears.

--
Joshua Kinard
Gentoo/MIPS
[hidden email]
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Kent Fredric

On 24 March 2014 11:54, Joshua Kinard <[hidden email]> wrote:
That said, Is XML that specific that every single atom has to be wrapped by
an individual tag?  A comma-separated list of values in its own XML tag is
prohibited by the spec?  I don't use XML often (if at all), so I am not
familiar with its intrinsics.


By nesting CSV inside XML, you've now got 2 formats to deal with instead of 1.

In pure XML, you can get a properly decoded array of tag elements with a simple XPath query:

          //tag

But with CSV-in-a-tag you have to extract the tag and subsequently parse it.

So you're hand implementing a parser to parse parts of XML that already convey data without needing to hand-parse.

Which is more effort for everyone who touches the file, not less.

Add to that automated ways to update the tags ( again, having to implement a custom serialiser in addition to the custom parser ) and its just not worth the tiny amount of savings.

Because really, if space efficiency was #1 priority, we'd not be using XML at all, let alone XML with pesky whitespace indentation that consumes needless bytes. =)

--
Kent
Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Joshua Kinard-2
On 03/23/2014 19:18, Kent Fredric wrote:

> On 24 March 2014 11:54, Joshua Kinard <[hidden email]> wrote:
>
>> That said, Is XML that specific that every single atom has to be wrapped by
>> an individual tag?  A comma-separated list of values in its own XML tag is
>> prohibited by the spec?  I don't use XML often (if at all), so I am not
>> familiar with its intrinsics.
>>
>
>
> By nesting CSV inside XML, you've now got 2 formats to deal with instead of
> 1.
>
> In pure XML, you can get a properly decoded array of tag elements with a
> simple XPath query:
>
>           //tag
>
> But with CSV-in-a-tag you have to extract the tag and subsequently parse it.

I am probably thinking from a Python perspective then.  All you have to do
is grab the value of <tags> and then split it on the comma.  No custom
parsing needed, since that function is built into Python.  I guess this
might not be the case with other languages, though, and it really just adds
to my distaste of XML as a format for metadata.xml in the first place.


> So you're hand implementing a parser to parse parts of XML that already
> convey data without needing to hand-parse.
>
> Which is more effort for everyone who touches the file, not less.
>
> Add to that automated ways to update the tags ( again, having to implement
> a custom serialiser in addition to the custom parser ) and its just not
> worth the tiny amount of savings.
>
> Because really, if space efficiency was #1 priority, we'd not be using XML
> at all, let alone XML with pesky whitespace indentation that consumes
> needless bytes. =)

I guess I need to start looking for used TARDISes then...

Thanks for the explanation.

--
Joshua Kinard
Gentoo/MIPS
[hidden email]
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Tom Wijsman-2
In reply to this post by Alan McKinnon-2
On Sun, 23 Mar 2014 23:47:22 +0200
Alan McKinnon <[hidden email]> wrote:

> Tags work best when they describe narrow, clearly defined attributes,
> and the thing they are applied to can have one, two or more of these
> attributes or sometimes even none. Music and movie genres are an
> excellent example - there are only so many of them and for the most
> part one can tell whether a tag really is a genre or not.

There are more ways to search for a music or a movie than a genre:

What mood is it in? What are key elements of its plot or lyrics?
Where does it take place? For which audience is it meant? Which praises
has it received? What kind of style is it made in? What is it based on?
What is the attitude of it? What looks or effects does it have? Is it
appropriate for children? Does it contain explicit things?

Let's do this for movies. I'm looking for a ...

... serial killer (key element) that is scary (mood)?
    Carrie, Halloween, Saw, Scream, ...

... musical (genre) that makes one feel good (mood)?
    Aaja Nachle, Frozen, Grease, The Sound of Music, ...

... good versus evil (plot) based on comics (based on)?
    Batman, Sin City, Superman, The Avengers, ...

... goofy (attitude) hero (key element) where nothing goes right (plot)?
    Due Date, Faulty Towers, Monty Python's Flying Circus, Mr Bean, ...

These are results from an actual movie recommendation site; similarly,
the same exists for music too, where you can for example look for a
female american singer-songwriter singing catchy contemporary country.

Getting back to Gentoo; when I would look for some package, I want it to
be a lightweight, do audio recordings, organize these audio recordings
and do effects on these audio recordings. So, I'll be looking for tags
like "lightweight, audio-recording, file-organization, sound-effects";
if that's to broad, I can take two of them and test some of that.

Thinking about the different types of things to search for; I'm
thinking about ...

... what the characteristics of the software are
    (light/heavy, new/old, extensible/modular/nonstandard, ...),

... what the software can do (record audio, organize files, ...),

... what category (browser, development, DAW software, utility, ...),

... what kind of interface the software has to me (CLI, GUI, ...),

... what interconnectivity the software has (internet, bluetooth, ...),

... and so on ...

We could make a list of types (some already mentioned above) and a list
of possible tags for that type to shape the tag system somewhat.

--
With kind regards,

Tom Wijsman (TomWij)
Gentoo Developer

E-mail address  : [hidden email]
GPG Public Key  : 6D34E57D
GPG Fingerprint : C165 AF18 AB4C 400B C3D2  ABF0 95B2 1FCD 6D34 E57D

Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Jeroen Roovers-3
In reply to this post by Alexander Berntsen-2
On Sun, 23 Mar 2014 16:03:38 +0100
Alexander Berntsen <[hidden email]> wrote:

> On 23/03/14 15:46, Jeroen Roovers wrote:
> > "This GLEP author would love to blight categories out of gentoo
> > history as a giant mistake."

That's not what I wrote. It's a quotation.

> It does not matter. Just remove that line. It is irrelevant.

The point in asking why it's there was to establish why the GLEP as a
whole is relevant. In other words: it would be trivial[1] yet
pains-taking[2] to establish an alternative means to address the package
manager to package targets, but why would we want to do it?

Examples of where atoms fail and where tags do better could enlighten
us.


     jer


[1] Set up a PM wrapper that translates tags into atoms.
[2] Set up a database of such translations, with a really easy
    fail-over to ordinary atoms where the database is incomplete.

Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Wyatt Epp
In reply to this post by Alec Warner-2
On Sat, Mar 22, 2014 at 6:33 PM, Alec Warner <[hidden email]> wrote:
> https://wiki.gentoo.org/wiki/Package_Tags
>
Ack, this had to happen on a weekend when I wasn't paying attention!
And you beat me to it, too-- I was working on something in this vein,
but wasn't quite satisfied with the design yet.  Oh well.  You're sort
of on the right track, but there are some very important aspects
missing that will make the whole thing collapse with their absence.
(This thread has been in various places, but I frankly don't feel like
finding the relevant snippets, so you get a text dump.  Sorry about
that.)

The first thing missing is aliasing (most proposals for this sort of
system miss this at first; don't feel too bad).  There are many, many,
many cases where you want more than one single tag query to resolve to
the same canonical tag.  The ability to define aliases that take care
of this automatically is critical.  In my notes on this, I had a
global alias file, and users can have an /etc/portage/tag.alias.  It's
just text -- nothing special -- that defines antecedent = consequent
relationships.  This means the antecedent is _replaced_ by the
consequent.  As a quick example, cpp = c++ This also allows for simple
changes to the canonical name.

Second, implication is important for decreasing maintenance burden.
An implication is an antecedent -> consequent relationship where the
consequent is automatically added if the antecedent is present.
Unlike aliasing, the consequent doesn't _replace_ the antecedent.  An
example of this is acpi -> power_management, because acpi is a
distinct aspect of power management, and has value on its own.  Over
time, this significantly lowers the maintenance burden of an expanding
vocabulary and tree.

With that in place, I want to make something clear: consistency in the
vocabulary is absolutely critical.  I cannot overemphasise how
important this is.  Adding tags without any sort of discipline leads
to an unmaintainable vocabulary, which makes the whole thing as
worthless as some people think.  So there needs some sort of basic
canonical list of tags with their descriptions, and yes people should
be expected to be rigourous in how they approach this.  I've attached
a rough draft of descriptions and aliases that I pulled together a
while ago (analogous to /etc/portage/profiles/use.desc).

This is where aliasing becomes essential, because it allows us to
guarantee some amount of consistency.  We're only human and can't be
expected to cover every situation, but there's plenty of low-hanging
fruit in this area.  e.g.:
app = application                       # Alias abbreviation to full tag
editors = editor                        # Make plural -> singular
aliases standard where sensible.
                                        # Rule of thumb 1: "This is a(n)..."
admin = administration                  # Rule of thumb 2: "This is
a(n)... ...tool"
backup = back-up                        # Can use hyphenated forms
benchmark = benchmarking                # As with admin, only gerund form.
cdr = disk_authoring                    # Spaces replaced with
underscores at word boundaries
i18n = internationalisation             # Will need to come to a
consensus on the s/z spelling and make some aliases.
cpp = c++                               # Valid tags should be
restricted to basic ASCII minus spaces (replaced with underscores) for
our own sanity
.net = dotnet                           # This could go either way,
but the leading period makes my Unix blood distrust it.
gamedev = game_development              # "games" becomes ambiguous
with "game" so prefer a more-clear form.
lang = language = programming_language  # Not to be confused with the
i18n language support. Avoid confusion with clear naming
version_control = source_control = vcs  # Well known abbreviations can
be used in place of their expansions
mail = email                            # No sense not being clear
mail_server = mail_transfer_agent = mta # Multiple aliases to the same
thing are acceptable
nntp = {{newsreader usenet}}            # The braced notation denotes
an intersection of two tags.  Need to decide if this sort of alias is
legal.  I'm thinking no, honestly.
sys = system                            # BUT it's in conflict with
@system!  Don't do that.
www = web                               # These are all things that
deal with the web specifically.
apache = apache_module                  # classes of packages that
have their own categories is exactly why this is a good idea.

The above is just an excerpt copied directly from my notes on
aliasing.  Some other stuff:
- Query syntax and semantics can be addressed in greater detail later.
 There's some nice sugar to be had here.
- Likewise, tools.  Something along the lines of quse and equery would
be handy in support of this.
- Aliases for reasonable search terms are not a bad idea.
- I've stated at various points in the past, but categories are
already tags after a fashion.  They're not very good ones, but they're
a good place to start.  Moreover, current metapackages and sets are
somewhat like tags in their own right.
- USEs might also be considered as a source of inspiration. That said,
I don't think anything like conditional tags based on the profile's
selected USE is a good idea.  Don't make this more complex than it is.
- Succinctly, strongly hierarchical tags are a mistake and will cause
you more grief than you can imagine.  Ontologically, aim for "mostly
flat".
- Limiting the number of tags allowed on a package is a horrible idea;
seriously, don't even consider that-- you would absolutely regret it.
The whole point of this is to allow useful semantic description.
- Crowdsourcing is something that _can_ work, but needs to be
moderated in some way.  It could work well to deputise some trusted
users for this task, similar to arch testing, and they have mandate to
do responsible tag gardening.
- A good maxim for additions is "tag what you see".  If it provides a
library with a lua bindings, then that's probably a good thing to tag.
- Maintainers can be awfully possessive of their packages, but on this
subject I think it would benefit them to unclench a little.  Most
additions should be relatively obvious.
- Per-$PV tagging is honestly probably not necessary.  Sticking it in
metadata.xml seems reasonable for now.

Regards,
Wyatt

alias.desc (4K) Download Attachment
tags.desc (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Alan McKinnon-2
In reply to this post by Tom Wijsman-2
On 24/03/2014 02:43, Tom Wijsman wrote:

> On Sun, 23 Mar 2014 23:47:22 +0200
> Alan McKinnon <[hidden email]> wrote:
>
>> Tags work best when they describe narrow, clearly defined attributes,
>> and the thing they are applied to can have one, two or more of these
>> attributes or sometimes even none. Music and movie genres are an
>> excellent example - there are only so many of them and for the most
>> part one can tell whether a tag really is a genre or not.
>
> There are more ways to search for a music or a movie than a genre:

Genre was just one example of tag usage for illustration. Doesn't mean
there aren't other equally good or valid examples.

>
> What mood is it in? What are key elements of its plot or lyrics?
> Where does it take place? For which audience is it meant? Which praises
> has it received? What kind of style is it made in? What is it based on?
> What is the attitude of it? What looks or effects does it have? Is it
> appropriate for children? Does it contain explicit things?
>
> Let's do this for movies. I'm looking for a ...
>
> ... serial killer (key element) that is scary (mood)?
>     Carrie, Halloween, Saw, Scream, ...
>
> ... musical (genre) that makes one feel good (mood)?
>     Aaja Nachle, Frozen, Grease, The Sound of Music, ...
>
> ... good versus evil (plot) based on comics (based on)?
>     Batman, Sin City, Superman, The Avengers, ...
>
> ... goofy (attitude) hero (key element) where nothing goes right (plot)?
>     Due Date, Faulty Towers, Monty Python's Flying Circus, Mr Bean, ...
>
> These are results from an actual movie recommendation site; similarly,
> the same exists for music too, where you can for example look for a
> female american singer-songwriter singing catchy contemporary country.
>
> Getting back to Gentoo; when I would look for some package, I want it to
> be a lightweight, do audio recordings, organize these audio recordings
> and do effects on these audio recordings. So, I'll be looking for tags
> like "lightweight, audio-recording, file-organization, sound-effects";
> if that's to broad, I can take two of them and test some of that.
>
> Thinking about the different types of things to search for; I'm
> thinking about ...
>
> ... what the characteristics of the software are
>     (light/heavy, new/old, extensible/modular/nonstandard, ...),
>
> ... what the software can do (record audio, organize files, ...),
>
> ... what category (browser, development, DAW software, utility, ...),
>
> ... what kind of interface the software has to me (CLI, GUI, ...),
>
> ... what interconnectivity the software has (internet, bluetooth, ...),
>
> ... and so on ...
>
> We could make a list of types (some already mentioned above) and a list
> of possible tags for that type to shape the tag system somewhat.

Have you considered just how much heavy lifting that is? Who is going to
compile the list of tags? Who is going to approve/disapprove tagable
attributes and the tags themselves? How will you resolve disagreements
people have?

What about the case of a package maintainer that simply can't be
bothered doing tags at all?

I'm not against tagging per se, they can be useful. But they do have to
be strictly controlled otherwise things get out of hand very quickly.
Every case I've seen of software that uses a freeform tagging mechanism
fails almost instantly as it becomes very inconsistent. I have one of
these apps in a corporate setting right now, have you any idea how many
ways people can come up with to tag the concept of "cloud"? I have tags
in there where someone translated the word "cloud" to a different
language! It sounded like a good idea at the time to them....

All in all, tagging is a huge amount of work and the odds of failure are
high. People need to be aware of this reality.

Wyatt Epp's post at 03:25 expresses very nicely in a more formal
language what I'm saying.


--
Alan McKinnon
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Tom Wijsman-2
On Mon, 24 Mar 2014 09:32:40 +0200
Alan McKinnon <[hidden email]> wrote:

> On 24/03/2014 02:43, Tom Wijsman wrote:
> > On Sun, 23 Mar 2014 23:47:22 +0200
> > Alan McKinnon <[hidden email]> wrote:
> >
> >> Tags work best when they describe narrow, clearly defined
> >> attributes, and the thing they are applied to can have one, two or
> >> more of these attributes or sometimes even none. Music and movie
> >> genres are an excellent example - there are only so many of them
> >> and for the most part one can tell whether a tag really is a genre
> >> or not.
> >
> > There are more ways to search for a music or a movie than a genre:
>
> Genre was just one example of tag usage for illustration. Doesn't mean
> there aren't other equally good or valid examples.

+1 Ah, in that case, what I've said backs up your thought. \o/
 
> > We could make a list of types (some already mentioned above) and a
> > list of possible tags for that type to shape the tag system
> > somewhat.
>
> Have you considered just how much heavy lifting that is? Who is going
> to compile the list of tags?

+1 Yes, it's why I've stated before this should be crowd sourced.

> Who is going to approve/disapprove tagable attributes and the tags
> themselves?

Approval by default (with a quick skim over it) where we disapprove
what's not appropriate once we spot it could work. The "tagging rules"
will make themselves here. Those whom are interested could do it; that
is, I'd expect Alec to help out a bit, maybe I do too, maybe others?

> How will you resolve disagreements people have?

Discussion and/or votes.

> What about the case of a package maintainer that simply can't be
> bothered doing tags at all?

+1 [see crowd sourced idea]

> I'm not against tagging per se, they can be useful.

+1, same thought; it's nice to have, but it needs to be good to work.

> But they do have to be strictly controlled otherwise things get out
> of hand very quickly. Every case I've seen of software that uses a
> freeform tagging mechanism fails almost instantly as it becomes very
> inconsistent. I have one of these apps in a corporate setting right
> now, have you any idea how many ways people can come up with to tag
> the concept of "cloud"? I have tags in there where someone translated
> the word "cloud" to a different language! It sounded like a good idea
> at the time to them....
>
> All in all, tagging is a huge amount of work and the odds of failure
> are high. People need to be aware of this reality.

+1 As can be seen that it can be made to work with things like movie
and music recommendation; it indeed took a while till they got at that
point, doing the work right avoids us to spend too much time on this.

> Wyatt Epp's post at 03:25 expresses very nicely in a more formal
> language what I'm saying.

+1

--
With kind regards,

Tom Wijsman (TomWij)
Gentoo Developer

E-mail address  : [hidden email]
GPG Public Key  : 6D34E57D
GPG Fingerprint : C165 AF18 AB4C 400B C3D2  ABF0 95B2 1FCD 6D34 E57D

yac
Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

yac
In reply to this post by Jeroen Roovers-3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On Sun, 23 Mar 2014 15:46:09 +0100
Jeroen Roovers <[hidden email]> wrote:

> On Sat, 22 Mar 2014 15:33:27 -0700
> Alec Warner <[hidden email]> wrote:
>
> > https://wiki.gentoo.org/wiki/Package_Tags
>
>    "This GLEP author would love to blight categories out of gentoo
>     history as a giant mistake."
>
> Why?
>
>
>      jer

Categories are essentially tags, only less powerful as they can express
relationship of 1:N while tags are can express M:N


- --
Jan Matějka        | Gentoo Developer
https://gentoo.org | Gentoo Linux
GPG: A33E F5BC A9F6 DAFD 2021  6FB6 3EBF D45B EEB6 CA8B
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBCgAGBQJTMBi2AAoJEIN+7RD5ejahpUsH/3UPmgwx9PxtEJqzsz4q0zKi
6hfzNgeULOia7n8zIqv+UKS82EgkG1XW16xFabTvuFBM+0cHagIuMmA9ViLC/gHw
DIcefzA9pOQ17Z+KpZJCWUQEzAjlAy2zrTtVaRihos/xo4I6y83tUxfIRq7v+0/e
HRY2A/YTY8/8O+FYf32GfVHbL+1h3rXQXSXz9rd6n+wICfojAzw6Ngnrmr1yZPkO
RRksZVYvmcHE2ve/CtGkmXYyr8qCs1n/gVdnl6M6Y3EBKopL+BgZkJpaDgWsinJj
xwimfUYJbt5GfbVzKINwd+0d8w7cq0vcHLR851/8xHmNXLc+L4MzRnwAXKPK3u8=
=s1CY
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Jeroen Roovers-3
On Mon, 24 Mar 2014 12:36:19 +0100
Jan Matejka <[hidden email]> wrote:

> Categories are essentially tags, only less powerful as they can
> express relationship of 1:N while tags are can express M:N

No, categories are essentially directories.

I was asking about tags, not about categories.

It appears it's very hard to answer the simple questions of why we need
tags and how we would use them. The answers should typically involve
some explanation of how you're going to use the things once you have
them.


     jer

Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Damien Levac
On 14-03-24 10:25 AM, Jeroen Roovers wrote:

> On Mon, 24 Mar 2014 12:36:19 +0100
> Jan Matejka <[hidden email]> wrote:
>
>> Categories are essentially tags, only less powerful as they can
>> express relationship of 1:N while tags are can express M:N
> No, categories are essentially directories.
>
> I was asking about tags, not about categories.
>
> It appears it's very hard to answer the simple questions of why we need
> tags and how we would use them. The answers should typically involve
> some explanation of how you're going to use the things once you have
> them.
>
>
>       jer
>
A lot of people already replied to this question: package search.

A trivial example, a user want to know all terminals available in
portage. Of course he could try a `emerge --searchdesc terminal`, but
then he would get anything mentioning terminal in the description: which
would probably include a lot of "terminal applications" which are not
terminals themselves...

`emerge --search terminal` just doesn't cut it as "konsole" wouldn't be
a result but is a terminal emulator...

On the other hand, terminals are spread through many categories
(gnome-terminal in gnome-base & konsole in kde-base to name the most
obvious example).

Thus tags are a nice way for user to find the applications they want.

Damien

Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Jeroen Roovers-3
On Mon, 24 Mar 2014 10:55:38 -0400
Damien Levac <[hidden email]> wrote:

> A lot of people already replied to this question: package search.

I didn't ask for an explanation on the mailing list. I quoted [1]
because it needs to be more specific exactly where it needs to be more
specific. The GLEP still doesn't explain properly why it exists in the
first place.


     jer


[1] https://wiki.gentoo.org/wiki/Package_Tags

Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Ciaran McCreesh-4
In reply to this post by Damien Levac
On Mon, 24 Mar 2014 10:55:38 -0400
Damien Levac <[hidden email]> wrote:
> A lot of people already replied to this question: package search.

Sure, but can you point to prior examples of this kind of stuff
actually working?

--
Ciaran McCreesh

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Damien Levac
On 14-03-24 12:28 PM, Ciaran McCreesh wrote:
> On Mon, 24 Mar 2014 10:55:38 -0400
> Damien Levac <[hidden email]> wrote:
>> A lot of people already replied to this question: package search.
> Sure, but can you point to prior examples of this kind of stuff
> actually working?
>
I have no example for package searching... However it is used a lot in
multimedia search since it is traditional to give tags to video for
example. If you want a "funny" example of tags, see tags for animes on
anidb.net --- this allows users to easily find animes that contains
element they enjoy to see.

That being said, I am surprised that having no example showing it works
should be a deal breaker for trying it out. Wouldn't that mindset kill
innovation? Personally, I expect it to be not so great at the beginning,
as the tags chosen will most likely be the most clever ones on the first
try, and will get more and more useful as the tag convention get better.

Such a feature could later being used to create applications like *gasp*
an intuitive GUI interface to portage or for statistical analysis of
packages...

Damien

Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Wyatt Epp
In reply to this post by Ciaran McCreesh-4
On Mon, Mar 24, 2014 at 12:28 PM, Ciaran McCreesh
<[hidden email]> wrote:
> On Mon, 24 Mar 2014 10:55:38 -0400
> Damien Levac <[hidden email]> wrote:
>> A lot of people already replied to this question: package search.
>
> Sure, but can you point to prior examples of this kind of stuff
> actually working?
>
eix -C allows you to search for categories.  It's horrendously
under-powered, but almost a useful prototype of what could be.

Pandora uses this general concept with superb granularity for graphing
similarities in music.  That the MGP data is only used for a streaming
service is depressing.

Alternativeto.net is software oriented and has a good bit of this.
Results?  http://alternativeto.net/tag/tiling/ Bam.  Tiling window
managers.  (These are almost certainly all user-sourced; notice the
innocent misuse in that list.)

The various Danbooru-style sites will generally show off impressive
community-sourced rigour as well as proving the efficacy of
alias/implication at scale.  I have a lot of respect for their
collective pep. Most are NSFW, but this one probably won't be (much):
http://safebooru.org/‎

The Library of Congress? (The modern library is practically built on
this sort of metadata.)

Regards,
Wyatt

Reply | Threaded
Open this post in threaded view
|

Re: RFC GLEP 1005: Package Tags

Ciaran McCreesh-4
In reply to this post by Damien Levac
On Mon, 24 Mar 2014 13:31:43 -0400
Damien Levac <[hidden email]> wrote:
> That being said, I am surprised that having no example showing it
> works should be a deal breaker for trying it out. Wouldn't that
> mindset kill innovation?

I ask, because this isn't the first time "tags" have been proposed as
the magic solution to everything. Each previous time, everyone has had
a slightly different, incompatible idea of what "tags" are, what
they're supposed to do, and how they're supposed to do it. So I would
like to see someone explain in detail, and without glossing over the
inconvenient technicalities, just how "tags" will help with "searching".

> Such a feature could later being used to create applications like
> *gasp* an intuitive GUI interface to portage or for statistical
> analysis of packages...

Could, maybe. But the current lack of intuitive GUI and the lack of
statistical analysis both have absolutely nothing to do with not having
tags...

--
Ciaran McCreesh

signature.asc (205 bytes) Download Attachment
123