Add caching to a few commonly used functions

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Add caching to a few commonly used functions

Chun-Yu Shei-2
Hi,

I was recently interested in whether portage could be speed up, since
dependency resolution can sometimes take a while on slower machines.
After generating some flame graphs with cProfile and vmprof, I found 3
functions which seem to be called extremely frequently with the same
arguments: catpkgsplit, use_reduce, and match_from_list.  In the first
two cases, it was simple to cache the results in dicts, while
match_from_list was a bit trickier, since it seems to be a requirement
that it return actual entries from the input "candidate_list".  I also
ran into some test failures if I did the caching after the
mydep.unevaluated_atom.use and mydep.repo checks towards the end of the
function, so the caching is only done up to just before that point.

The catpkgsplit change seems to definitely be safe, and I'm pretty sure
the use_reduce one is too, since anything that could possibly change the
result is hashed.  I'm a bit less certain about the match_from_list one,
although all tests are passing.

With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup.  "emerge
-ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
(2.5% improvement).  Since the upgrade case is far more common, this
would really help in daily use, and it shaves about 30 seconds off
the time you have to wait to get to the [Yes/No] prompt (from ~90s to
60s) on my old Sandy Bridge laptop when performing normal upgrades.

Hopefully, at least some of these patches can be incorporated, and please
let me know if any changes are necessary.

Thanks,
Chun-Yu



Reply | Threaded
Open this post in threaded view
|

[PATCH 1/3] Add caching to catpkgsplit function

Chun-Yu Shei-2
According to cProfile, catpkgsplit is called up to 1-5.5 million times
during "emerge -uDvpU --with-bdeps=y @world". Adding a dict to cache its
results reduces the time for this command from 43.53 -> 41.53 seconds --
a 4.8% speedup.
---
 lib/portage/versions.py | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/portage/versions.py b/lib/portage/versions.py
index 0c21373cc..ffec316ce 100644
--- a/lib/portage/versions.py
+++ b/lib/portage/versions.py
@@ -312,6 +312,7 @@ def _pkgsplit(mypkg, eapi=None):
 
 _cat_re = re.compile('^%s$' % _cat, re.UNICODE)
 _missing_cat = 'null'
+_catpkgsplit_cache = {}
 
 def catpkgsplit(mydata, silent=1, eapi=None):
  """
@@ -331,6 +332,11 @@ def catpkgsplit(mydata, silent=1, eapi=None):
  return mydata.cpv_split
  except AttributeError:
  pass
+
+ cache_entry = _catpkgsplit_cache.get(mydata)
+ if cache_entry is not None:
+ return cache_entry
+
  mysplit = mydata.split('/', 1)
  p_split = None
  if len(mysplit) == 1:
@@ -343,6 +349,7 @@ def catpkgsplit(mydata, silent=1, eapi=None):
  if not p_split:
  return None
  retval = (cat, p_split[0], p_split[1], p_split[2])
+ _catpkgsplit_cache[mydata] = retval
  return retval
 
 class _pkg_str(_unicode):
--
2.27.0.212.ge8ba1cc988-goog


Reply | Threaded
Open this post in threaded view
|

[PATCH 2/3] Add caching to use_reduce function

Chun-Yu Shei-2
In reply to this post by Chun-Yu Shei-2
This function is called extremely frequently with similar arguments, so
this optimization reduces "emerge -uDvpU --with-bdeps=y @world" runtime from
43.5 -> 34.5s -- a 25.8% speedup.
---
 lib/portage/dep/__init__.py | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/lib/portage/dep/__init__.py b/lib/portage/dep/__init__.py
index 72988357a..df296dd81 100644
--- a/lib/portage/dep/__init__.py
+++ b/lib/portage/dep/__init__.py
@@ -404,6 +404,8 @@ def paren_enclose(mylist, unevaluated_atom=False, opconvert=False):
  mystrparts.append(x)
  return " ".join(mystrparts)
 
+_use_reduce_cache = {}
+
 def use_reduce(depstr, uselist=(), masklist=(), matchall=False, excludeall=(), is_src_uri=False, \
  eapi=None, opconvert=False, flat=False, is_valid_flag=None, token_class=None, matchnone=False,
  subset=None):
@@ -440,6 +442,27 @@ def use_reduce(depstr, uselist=(), masklist=(), matchall=False, excludeall=(), i
  @rtype: List
  @return: The use reduced depend array
  """
+ uselist_key = None
+ masklist_key = None
+ excludeall_key = None
+ subset_key = None
+ if uselist is not None:
+    uselist_key = tuple(uselist)
+ if masklist is not None:
+    masklist_key = tuple(masklist)
+ if excludeall is not None:
+    excludeall_key = tuple(excludeall)
+ if subset is not None:
+    subset_key = tuple(subset)
+ cache_key = (depstr, uselist_key, masklist_key, matchall, excludeall_key, \
+ is_src_uri, eapi, opconvert, flat, is_valid_flag, token_class, \
+ matchnone, subset_key)
+
+ cache_entry = _use_reduce_cache.get(cache_key)
+ if cache_entry is not None:
+ # The list returned by this function may be modified, so return a copy.
+ return cache_entry[:]
+
  if isinstance(depstr, list):
  if portage._internal_caller:
  warnings.warn(_("Passing paren_reduced dep arrays to %s is deprecated. " + \
@@ -767,6 +790,9 @@ def use_reduce(depstr, uselist=(), masklist=(), matchall=False, excludeall=(), i
  raise InvalidDependString(
  _("Missing file name at end of string"))
 
+ # The list returned by this function may be modified, so store a copy.
+ _use_reduce_cache[cache_key] = stack[0][:]
+
  return stack[0]
 
 def dep_opconvert(deplist):
--
2.27.0.212.ge8ba1cc988-goog


Reply | Threaded
Open this post in threaded view
|

[PATCH 3/3] Add partial caching to match_from_list

Chun-Yu Shei-2
In reply to this post by Chun-Yu Shei-2
This function is called frequently with similar arguments, so cache as
much of the partial results as possible. It seems that "match_from_list"
must return a list containing actual entries from the "candidate_list" argument,
so we store frozensets in "_match_from_list_cache" and test items from
"candidate_list" for membership in these sets. The filtering performed
by the mydep.unevaluated_atom.use and mydep.repo checks towards the end
of the function is also not cached, since this causes some test failures.

This results in a reduction of "emerge -uDvpU --with-bdeps=y @world"
runtime from 43.53 -> 40.15 sec -- an 8.4% speedup.
---
 lib/portage/dep/__init__.py | 359 +++++++++++++++++++-----------------
 1 file changed, 189 insertions(+), 170 deletions(-)

diff --git a/lib/portage/dep/__init__.py b/lib/portage/dep/__init__.py
index df296dd81..dbd23bb23 100644
--- a/lib/portage/dep/__init__.py
+++ b/lib/portage/dep/__init__.py
@@ -2174,6 +2174,8 @@ def best_match_to_list(mypkg, mylist):
 
  return bestm
 
+_match_from_list_cache = {}
+
 def match_from_list(mydep, candidate_list):
  """
  Searches list for entries that matches the package.
@@ -2197,209 +2199,226 @@ def match_from_list(mydep, candidate_list):
  if not isinstance(mydep, Atom):
  mydep = Atom(mydep, allow_wildcard=True, allow_repo=True)
 
- mycpv     = mydep.cpv
- mycpv_cps = catpkgsplit(mycpv) # Can be None if not specific
- build_id  = mydep.build_id
+ cache_key = (mydep, tuple(candidate_list))
+ key_has_hash = True
+ cache_entry = None
+ if mydep.build_id is None and key_has_hash:
+ try:
+ cache_entry = _match_from_list_cache.get(cache_key)
+ except TypeError:
+ key_has_hash = False
 
- if not mycpv_cps:
- ver = None
- rev = None
- else:
- cat, pkg, ver, rev = mycpv_cps
- if mydep == mycpv:
- raise KeyError(_("Specific key requires an operator"
- " (%s) (try adding an '=')") % (mydep))
-
- if ver and rev:
- operator = mydep.operator
- if not operator:
- writemsg(_("!!! Invalid atom: %s\n") % mydep, noiselevel=-1)
- return []
+ if cache_entry is not None:
+ # Note: the list returned by this function must contain actual entries
+ # from "candidate_list", so store frozensets in "_match_from_list_cache"
+ # and test items from "candidate_list" for membership in these sets.
+ mylist = [x for x in candidate_list if x in cache_entry]
  else:
- operator = None
-
- mylist = []
+ mycpv     = mydep.cpv
+ mycpv_cps = catpkgsplit(mycpv) # Can be None if not specific
+ build_id  = mydep.build_id
 
- if mydep.extended_syntax:
+ if not mycpv_cps:
+ ver = None
+ rev = None
+ else:
+ cat, pkg, ver, rev = mycpv_cps
+ if mydep == mycpv:
+ raise KeyError(_("Specific key requires an operator"
+ " (%s) (try adding an '=')") % (mydep))
 
- for x in candidate_list:
- cp = getattr(x, "cp", None)
- if cp is None:
- mysplit = catpkgsplit(remove_slot(x))
- if mysplit is not None:
- cp = mysplit[0] + '/' + mysplit[1]
+ if ver and rev:
+ operator = mydep.operator
+ if not operator:
+ writemsg(_("!!! Invalid atom: %s\n") % mydep, noiselevel=-1)
+ return []
+ else:
+ operator = None
 
- if cp is None:
- continue
+ mylist = []
 
- if cp == mycpv or extended_cp_match(mydep.cp, cp):
- mylist.append(x)
+ if mydep.extended_syntax:
 
- if mylist and mydep.operator == "=*":
+ for x in candidate_list:
+ cp = getattr(x, "cp", None)
+ if cp is None:
+ mysplit = catpkgsplit(remove_slot(x))
+ if mysplit is not None:
+ cp = mysplit[0] + '/' + mysplit[1]
 
- candidate_list = mylist
- mylist = []
- # Currently, only \*\w+\* is supported.
- ver = mydep.version[1:-1]
+ if cp is None:
+ continue
 
- for x in candidate_list:
- x_ver = getattr(x, "version", None)
- if x_ver is None:
- xs = catpkgsplit(remove_slot(x))
- if xs is None:
- continue
- x_ver = "-".join(xs[-2:])
- if ver in x_ver:
+ if cp == mycpv or extended_cp_match(mydep.cp, cp):
  mylist.append(x)
 
- elif operator is None:
- for x in candidate_list:
- cp = getattr(x, "cp", None)
- if cp is None:
- mysplit = catpkgsplit(remove_slot(x))
- if mysplit is not None:
- cp = mysplit[0] + '/' + mysplit[1]
+ if mylist and mydep.operator == "=*":
 
- if cp is None:
- continue
+ candidate_list = mylist
+ mylist = []
+ # Currently, only \*\w+\* is supported.
+ ver = mydep.version[1:-1]
 
- if cp == mydep.cp:
- mylist.append(x)
+ for x in candidate_list:
+ x_ver = getattr(x, "version", None)
+ if x_ver is None:
+ xs = catpkgsplit(remove_slot(x))
+ if xs is None:
+ continue
+ x_ver = "-".join(xs[-2:])
+ if ver in x_ver:
+ mylist.append(x)
 
- elif operator == "=": # Exact match
- for x in candidate_list:
- xcpv = getattr(x, "cpv", None)
- if xcpv is None:
- xcpv = remove_slot(x)
- if not cpvequal(xcpv, mycpv):
- continue
- if (build_id is not None and
- getattr(xcpv, "build_id", None) != build_id):
- continue
- mylist.append(x)
+ elif operator is None:
+ for x in candidate_list:
+ cp = getattr(x, "cp", None)
+ if cp is None:
+ mysplit = catpkgsplit(remove_slot(x))
+ if mysplit is not None:
+ cp = mysplit[0] + '/' + mysplit[1]
 
- elif operator == "=*": # glob match
- # XXX: Nasty special casing for leading zeros
- # Required as =* is a literal prefix match, so can't
- # use vercmp
- myver = mycpv_cps[2].lstrip("0")
- if not myver or not myver[0].isdigit():
- myver = "0"+myver
- if myver == mycpv_cps[2]:
- mycpv_cmp = mycpv
- else:
- # Use replace to preserve the revision part if it exists
- # (mycpv_cps[3] can't be trusted because in contains r0
- # even when the input has no revision part).
- mycpv_cmp = mycpv.replace(
- mydep.cp + "-" + mycpv_cps[2],
- mydep.cp + "-" + myver, 1)
- for x in candidate_list:
- try:
- x.cp
- except AttributeError:
- try:
- pkg = _pkg_str(remove_slot(x))
- except InvalidData:
+ if cp is None:
  continue
- else:
- pkg = x
 
- xs = pkg.cpv_split
- myver = xs[2].lstrip("0")
- if not myver or not myver[0].isdigit():
- myver = "0"+myver
- if myver == xs[2]:
- xcpv = pkg.cpv
- else:
- # Use replace to preserve the revision part if it exists.
- xcpv = pkg.cpv.replace(
- pkg.cp + "-" + xs[2],
- pkg.cp + "-" + myver, 1)
- if xcpv.startswith(mycpv_cmp):
- # =* glob matches only on boundaries between version parts,
- # so 1* does not match 10 (bug 560466).
- next_char = xcpv[len(mycpv_cmp):len(mycpv_cmp)+1]
- if (not next_char or next_char in '._-' or
- mycpv_cmp[-1].isdigit() != next_char.isdigit()):
+ if cp == mydep.cp:
  mylist.append(x)
 
- elif operator == "~": # version, any revision, match
- for x in candidate_list:
- xs = getattr(x, "cpv_split", None)
- if xs is None:
- xs = catpkgsplit(remove_slot(x))
- if xs is None:
- raise InvalidData(x)
- if not cpvequal(xs[0]+"/"+xs[1]+"-"+xs[2], mycpv_cps[0]+"/"+mycpv_cps[1]+"-"+mycpv_cps[2]):
- continue
- if xs[2] != ver:
- continue
- mylist.append(x)
+ elif operator == "=": # Exact match
+ for x in candidate_list:
+ xcpv = getattr(x, "cpv", None)
+ if xcpv is None:
+ xcpv = remove_slot(x)
+ if not cpvequal(xcpv, mycpv):
+ continue
+ if (build_id is not None and
+ getattr(xcpv, "build_id", None) != build_id):
+ continue
+ mylist.append(x)
 
- elif operator in (">", ">=", "<", "<="):
- for x in candidate_list:
- if hasattr(x, 'cp'):
- pkg = x
+ elif operator == "=*": # glob match
+ # XXX: Nasty special casing for leading zeros
+ # Required as =* is a literal prefix match, so can't
+ # use vercmp
+ myver = mycpv_cps[2].lstrip("0")
+ if not myver or not myver[0].isdigit():
+ myver = "0"+myver
+ if myver == mycpv_cps[2]:
+ mycpv_cmp = mycpv
  else:
+ # Use replace to preserve the revision part if it exists
+ # (mycpv_cps[3] can't be trusted because in contains r0
+ # even when the input has no revision part).
+ mycpv_cmp = mycpv.replace(
+ mydep.cp + "-" + mycpv_cps[2],
+ mydep.cp + "-" + myver, 1)
+ for x in candidate_list:
  try:
- pkg = _pkg_str(remove_slot(x))
- except InvalidData:
- continue
+ x.cp
+ except AttributeError:
+ try:
+ pkg = _pkg_str(remove_slot(x))
+ except InvalidData:
+ continue
+ else:
+ pkg = x
+
+ xs = pkg.cpv_split
+ myver = xs[2].lstrip("0")
+ if not myver or not myver[0].isdigit():
+ myver = "0"+myver
+ if myver == xs[2]:
+ xcpv = pkg.cpv
+ else:
+ # Use replace to preserve the revision part if it exists.
+ xcpv = pkg.cpv.replace(
+ pkg.cp + "-" + xs[2],
+ pkg.cp + "-" + myver, 1)
+ if xcpv.startswith(mycpv_cmp):
+ # =* glob matches only on boundaries between version parts,
+ # so 1* does not match 10 (bug 560466).
+ next_char = xcpv[len(mycpv_cmp):len(mycpv_cmp)+1]
+ if (not next_char or next_char in '._-' or
+ mycpv_cmp[-1].isdigit() != next_char.isdigit()):
+ mylist.append(x)
 
- if pkg.cp != mydep.cp:
- continue
- try:
- result = vercmp(pkg.version, mydep.version)
- except ValueError: # pkgcmp may return ValueError during int() conversion
- writemsg(_("\nInvalid package name: %s\n") % x, noiselevel=-1)
- raise
- if result is None:
- continue
- elif operator == ">":
- if result > 0:
- mylist.append(x)
- elif operator == ">=":
- if result >= 0:
- mylist.append(x)
- elif operator == "<":
- if result < 0:
- mylist.append(x)
- elif operator == "<=":
- if result <= 0:
- mylist.append(x)
- else:
- raise KeyError(_("Unknown operator: %s") % mydep)
- else:
- raise KeyError(_("Unknown operator: %s") % mydep)
+ elif operator == "~": # version, any revision, match
+ for x in candidate_list:
+ xs = getattr(x, "cpv_split", None)
+ if xs is None:
+ xs = catpkgsplit(remove_slot(x))
+ if xs is None:
+ raise InvalidData(x)
+ if not cpvequal(xs[0]+"/"+xs[1]+"-"+xs[2], mycpv_cps[0]+"/"+mycpv_cps[1]+"-"+mycpv_cps[2]):
+ continue
+ if xs[2] != ver:
+ continue
+ mylist.append(x)
 
- if mydep.slot is not None:
- candidate_list = mylist
- mylist = []
- for x in candidate_list:
- x_pkg = None
- try:
- x.cpv
- except AttributeError:
- xslot = dep_getslot(x)
- if xslot is not None:
+ elif operator in (">", ">=", "<", "<="):
+ for x in candidate_list:
+ if hasattr(x, 'cp'):
+ pkg = x
+ else:
  try:
- x_pkg = _pkg_str(remove_slot(x), slot=xslot)
+ pkg = _pkg_str(remove_slot(x))
  except InvalidData:
  continue
- else:
- x_pkg = x
 
- if x_pkg is None:
- mylist.append(x)
- else:
+ if pkg.cp != mydep.cp:
+ continue
  try:
- x_pkg.slot
+ result = vercmp(pkg.version, mydep.version)
+ except ValueError: # pkgcmp may return ValueError during int() conversion
+ writemsg(_("\nInvalid package name: %s\n") % x, noiselevel=-1)
+ raise
+ if result is None:
+ continue
+ elif operator == ">":
+ if result > 0:
+ mylist.append(x)
+ elif operator == ">=":
+ if result >= 0:
+ mylist.append(x)
+ elif operator == "<":
+ if result < 0:
+ mylist.append(x)
+ elif operator == "<=":
+ if result <= 0:
+ mylist.append(x)
+ else:
+ raise KeyError(_("Unknown operator: %s") % mydep)
+ else:
+ raise KeyError(_("Unknown operator: %s") % mydep)
+
+ if mydep.slot is not None:
+ candidate_list = mylist
+ mylist = []
+ for x in candidate_list:
+ x_pkg = None
+ try:
+ x.cpv
  except AttributeError:
+ xslot = dep_getslot(x)
+ if xslot is not None:
+ try:
+ x_pkg = _pkg_str(remove_slot(x), slot=xslot)
+ except InvalidData:
+ continue
+ else:
+ x_pkg = x
+
+ if x_pkg is None:
  mylist.append(x)
  else:
- if _match_slot(mydep, x_pkg):
+ try:
+ x_pkg.slot
+ except AttributeError:
  mylist.append(x)
+ else:
+ if _match_slot(mydep, x_pkg):
+ mylist.append(x)
+ if key_has_hash:
+ _match_from_list_cache[cache_key] = frozenset(mylist)
 
  if mydep.unevaluated_atom.use:
  candidate_list = mylist
--
2.27.0.212.ge8ba1cc988-goog


Reply | Threaded
Open this post in threaded view
|

Re: Add caching to a few commonly used functions

Fabian Groffen-2
In reply to this post by Chun-Yu Shei-2
Hi Chun-Yu,

On 26-06-2020 23:34:12 -0700, Chun-Yu Shei wrote:

> Hi,
>
> I was recently interested in whether portage could be speed up, since
> dependency resolution can sometimes take a while on slower machines.
> After generating some flame graphs with cProfile and vmprof, I found 3
> functions which seem to be called extremely frequently with the same
> arguments: catpkgsplit, use_reduce, and match_from_list.  In the first
> two cases, it was simple to cache the results in dicts, while
> match_from_list was a bit trickier, since it seems to be a requirement
> that it return actual entries from the input "candidate_list".  I also
> ran into some test failures if I did the caching after the
> mydep.unevaluated_atom.use and mydep.repo checks towards the end of the
> function, so the caching is only done up to just before that point.
>
> The catpkgsplit change seems to definitely be safe, and I'm pretty sure
> the use_reduce one is too, since anything that could possibly change the
> result is hashed.  I'm a bit less certain about the match_from_list one,
> although all tests are passing.
>
> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup.  "emerge
> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
> (2.5% improvement).  Since the upgrade case is far more common, this
> would really help in daily use, and it shaves about 30 seconds off
> the time you have to wait to get to the [Yes/No] prompt (from ~90s to
> 60s) on my old Sandy Bridge laptop when performing normal upgrades.
>
> Hopefully, at least some of these patches can be incorporated, and please
> let me know if any changes are necessary.
This sounds like a good job to me!  Do you have any idea what the added
memory pressure for these changes are?

Thanks,
Fabian
>
> Thanks,
> Chun-Yu
>
>
>

--
Fabian Groffen
Gentoo on a different level

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Add caching to a few commonly used functions

Chun-Yu Shei-2
Hi Fabian,

Just eyeballing htop's RES column while emerge is running, the max
value I see during "emerge -uDvpU --with-bdeps=y @world" increases
from 272 MB to 380 MB, so it looks to be around 110 MB of extra memory
usage on my system (with 1,094 total packages installed).

Chun-Yu

On Sat, Jun 27, 2020, 12:35 AM Fabian Groffen <[hidden email]> wrote:

>
> Hi Chun-Yu,
>
> On 26-06-2020 23:34:12 -0700, Chun-Yu Shei wrote:
> > Hi,
> >
> > I was recently interested in whether portage could be speed up, since
> > dependency resolution can sometimes take a while on slower machines.
> > After generating some flame graphs with cProfile and vmprof, I found 3
> > functions which seem to be called extremely frequently with the same
> > arguments: catpkgsplit, use_reduce, and match_from_list.  In the first
> > two cases, it was simple to cache the results in dicts, while
> > match_from_list was a bit trickier, since it seems to be a requirement
> > that it return actual entries from the input "candidate_list".  I also
> > ran into some test failures if I did the caching after the
> > mydep.unevaluated_atom.use and mydep.repo checks towards the end of the
> > function, so the caching is only done up to just before that point.
> >
> > The catpkgsplit change seems to definitely be safe, and I'm pretty sure
> > the use_reduce one is too, since anything that could possibly change the
> > result is hashed.  I'm a bit less certain about the match_from_list one,
> > although all tests are passing.
> >
> > With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
> > speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup.  "emerge
> > -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
> > (2.5% improvement).  Since the upgrade case is far more common, this
> > would really help in daily use, and it shaves about 30 seconds off
> > the time you have to wait to get to the [Yes/No] prompt (from ~90s to
> > 60s) on my old Sandy Bridge laptop when performing normal upgrades.
> >
> > Hopefully, at least some of these patches can be incorporated, and please
> > let me know if any changes are necessary.
>
> This sounds like a good job to me!  Do you have any idea what the added
> memory pressure for these changes are?
>
> Thanks,
> Fabian
> >
> > Thanks,
> > Chun-Yu
> >
> >
> >
>
> --
> Fabian Groffen
> Gentoo on a different level

Reply | Threaded
Open this post in threaded view
|

Re: Add caching to a few commonly used functions

Kent Fredric
In reply to this post by Fabian Groffen-2
On Sat, 27 Jun 2020 at 19:35, Fabian Groffen <[hidden email]> wrote:
>
> Hi Chun-Yu,

> > arguments: catpkgsplit, use_reduce, and match_from_list.  In the first
> > two cases, it was simple to cache the results in dicts, while
> > match_from_list was a bit trickier, since it seems to be a requirement
> > that it return actual entries from the input "candidate_list".  I also
> > ran into some test failures if I did the caching after the
> > mydep.unevaluated_atom.use and mydep.repo checks towards the end of the
> > function, so the caching is only done up to just before that point.

You may also want to investigate the version aspect parsing logic
where it converts versions into a data structure, partly because the
last time I tried profiling portage, every sample seemed to turn up in
there.

And I'd expect to see a lot of commonality in this.


# qlist -I --format "%{PV}" | wc -c
14678
# qlist -I --format "%{PV}" | sort -u | wc -c
8811

And given this version-parsing path is even handled for stuff *not*
installed, I suspect the real-world implications are worse

# find /usr/portage/ -name "*.ebuild"  | sed
's|/usr/portage/||;s|/[^/]*/|/|;s|[.]ebuild$||' | xargs qatom -CF
"%{PV}" | wc -l
32604
# find /usr/portage/ -name "*.ebuild"  | sed
's|/usr/portage/||;s|/[^/]*/|/|;s|[.]ebuild$||' | xargs qatom -CF
"%{PVR}" | sort -u | wc -l
10362
katipo2 ~ # find /usr/portage/ -name "*.ebuild"  | sed
's|/usr/portage/||;s|/[^/]*/|/|;s|[.]ebuild$||' | xargs qatom -CF
"%{PV}" | sort -u | wc -l
7515

Obviously this is very crude analysis, but you see there's room to
potentially no-op half of all version parses. Though the speed/memory
tradeoff may not be worth it.

Note, that this is not just "parse the version on the ebuild", which
is fast, but my sampling seemed to indicate it was parsing the version
afresh for every version comparison, which means internally, it was
parsing the same version dozens of times over, which is much slower!




--
Kent

KENTNL - https://metacpan.org/author/KENTNL

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] Add caching to catpkgsplit function

Michał Górny-5
In reply to this post by Chun-Yu Shei-2
Dnia June 27, 2020 6:34:13 AM UTC, Chun-Yu Shei <[hidden email]> napisał(a):
>According to cProfile, catpkgsplit is called up to 1-5.5 million times
>during "emerge -uDvpU --with-bdeps=y @world". Adding a dict to cache
>its
>results reduces the time for this command from 43.53 -> 41.53 seconds
>--
>a 4.8% speedup.


Not saying caching is wrong for an interim solution but this is the kind of function where refactoring may yield even more gain.


>---
> lib/portage/versions.py | 7 +++++++
> 1 file changed, 7 insertions(+)
>
>diff --git a/lib/portage/versions.py b/lib/portage/versions.py
>index 0c21373cc..ffec316ce 100644
>--- a/lib/portage/versions.py
>+++ b/lib/portage/versions.py
>@@ -312,6 +312,7 @@ def _pkgsplit(mypkg, eapi=None):
>
> _cat_re = re.compile('^%s$' % _cat, re.UNICODE)
> _missing_cat = 'null'
>+_catpkgsplit_cache = {}
>
> def catpkgsplit(mydata, silent=1, eapi=None):
> """
>@@ -331,6 +332,11 @@ def catpkgsplit(mydata, silent=1, eapi=None):
> return mydata.cpv_split
> except AttributeError:
> pass
>+
>+ cache_entry = _catpkgsplit_cache.get(mydata)
>+ if cache_entry is not None:
>+ return cache_entry
>+
> mysplit = mydata.split('/', 1)
> p_split = None
> if len(mysplit) == 1:
>@@ -343,6 +349,7 @@ def catpkgsplit(mydata, silent=1, eapi=None):
> if not p_split:
> return None
> retval = (cat, p_split[0], p_split[1], p_split[2])
>+ _catpkgsplit_cache[mydata] = retval
> return retval
>
> class _pkg_str(_unicode):


--
Best regards,
Michał Górny

Reply | Threaded
Open this post in threaded view
|

Re: Add caching to a few commonly used functions

Zac Medico-2
In reply to this post by Chun-Yu Shei-2
On 6/26/20 11:34 PM, Chun-Yu Shei wrote:

> Hi,
>
> I was recently interested in whether portage could be speed up, since
> dependency resolution can sometimes take a while on slower machines.
> After generating some flame graphs with cProfile and vmprof, I found 3
> functions which seem to be called extremely frequently with the same
> arguments: catpkgsplit, use_reduce, and match_from_list.  In the first
> two cases, it was simple to cache the results in dicts, while
> match_from_list was a bit trickier, since it seems to be a requirement
> that it return actual entries from the input "candidate_list".  I also
> ran into some test failures if I did the caching after the
> mydep.unevaluated_atom.use and mydep.repo checks towards the end of the
> function, so the caching is only done up to just before that point.
>
> The catpkgsplit change seems to definitely be safe, and I'm pretty sure
> the use_reduce one is too, since anything that could possibly change the
> result is hashed.  I'm a bit less certain about the match_from_list one,
> although all tests are passing.
>
> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup.  "emerge
> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
> (2.5% improvement).  Since the upgrade case is far more common, this
> would really help in daily use, and it shaves about 30 seconds off
> the time you have to wait to get to the [Yes/No] prompt (from ~90s to
> 60s) on my old Sandy Bridge laptop when performing normal upgrades.
>
> Hopefully, at least some of these patches can be incorporated, and please
> let me know if any changes are necessary.
>
> Thanks,
> Chun-Yu
Using global variables for caches like these causes a form of memory
leak for use cases involving long-running processes that need to work
with many different repositories (and perhaps multiple versions of those
repositories).

There are at least a couple of different strategies that we can use to
avoid this form of memory leak:

1) Limit the scope of the caches so that they have some sort of garbage
collection life cycle. For example, it would be natural for the depgraph
class to have a local cache of use_reduce results, so that the cache can
be garbage collected along with the depgraph.

2) Eliminate redundant calls. For example, redundant calls to catpkgslit
can be avoided by constructing more _pkg_str instances, since
catpkgsplit is able to return early when its argument happens to be a
_pkg_str instance.
--
Thanks,
Zac


signature.asc (1000 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Add caching to a few commonly used functions

Michał Górny-5
Dnia June 28, 2020 3:00:00 AM UTC, Zac Medico <[hidden email]> napisał(a):

>On 6/26/20 11:34 PM, Chun-Yu Shei wrote:
>> Hi,
>>
>> I was recently interested in whether portage could be speed up, since
>> dependency resolution can sometimes take a while on slower machines.
>> After generating some flame graphs with cProfile and vmprof, I found
>3
>> functions which seem to be called extremely frequently with the same
>> arguments: catpkgsplit, use_reduce, and match_from_list.  In the
>first
>> two cases, it was simple to cache the results in dicts, while
>> match_from_list was a bit trickier, since it seems to be a
>requirement
>> that it return actual entries from the input "candidate_list".  I
>also
>> ran into some test failures if I did the caching after the
>> mydep.unevaluated_atom.use and mydep.repo checks towards the end of
>the
>> function, so the caching is only done up to just before that point.
>>
>> The catpkgsplit change seems to definitely be safe, and I'm pretty
>sure
>> the use_reduce one is too, since anything that could possibly change
>the
>> result is hashed.  I'm a bit less certain about the match_from_list
>one,
>> although all tests are passing.
>>
>> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
>> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup.
>"emerge
>> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
>> (2.5% improvement).  Since the upgrade case is far more common, this
>> would really help in daily use, and it shaves about 30 seconds off
>> the time you have to wait to get to the [Yes/No] prompt (from ~90s to
>> 60s) on my old Sandy Bridge laptop when performing normal upgrades.
>>
>> Hopefully, at least some of these patches can be incorporated, and
>please
>> let me know if any changes are necessary.
>>
>> Thanks,
>> Chun-Yu
>
>Using global variables for caches like these causes a form of memory
>leak for use cases involving long-running processes that need to work
>with many different repositories (and perhaps multiple versions of
>those
>repositories).
>
>There are at least a couple of different strategies that we can use to
>avoid this form of memory leak:
>
>1) Limit the scope of the caches so that they have some sort of garbage
>collection life cycle. For example, it would be natural for the
>depgraph
>class to have a local cache of use_reduce results, so that the cache
>can
>be garbage collected along with the depgraph.
>
>2) Eliminate redundant calls. For example, redundant calls to
>catpkgslit
>can be avoided by constructing more _pkg_str instances, since
>catpkgsplit is able to return early when its argument happens to be a
>_pkg_str instance.

I think the weak stuff from the standard library might also be helpful.

--
Best regards,
Michał Górny

Reply | Threaded
Open this post in threaded view
|

Re: Add caching to a few commonly used functions

Zac Medico-2
On 6/27/20 8:12 PM, Michał Górny wrote:

> Dnia June 28, 2020 3:00:00 AM UTC, Zac Medico <[hidden email]> napisał(a):
>> On 6/26/20 11:34 PM, Chun-Yu Shei wrote:
>>> Hi,
>>>
>>> I was recently interested in whether portage could be speed up, since
>>> dependency resolution can sometimes take a while on slower machines.
>>> After generating some flame graphs with cProfile and vmprof, I found
>> 3
>>> functions which seem to be called extremely frequently with the same
>>> arguments: catpkgsplit, use_reduce, and match_from_list.  In the
>> first
>>> two cases, it was simple to cache the results in dicts, while
>>> match_from_list was a bit trickier, since it seems to be a
>> requirement
>>> that it return actual entries from the input "candidate_list".  I
>> also
>>> ran into some test failures if I did the caching after the
>>> mydep.unevaluated_atom.use and mydep.repo checks towards the end of
>> the
>>> function, so the caching is only done up to just before that point.
>>>
>>> The catpkgsplit change seems to definitely be safe, and I'm pretty
>> sure
>>> the use_reduce one is too, since anything that could possibly change
>> the
>>> result is hashed.  I'm a bit less certain about the match_from_list
>> one,
>>> although all tests are passing.
>>>
>>> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
>>> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup.
>> "emerge
>>> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
>>> (2.5% improvement).  Since the upgrade case is far more common, this
>>> would really help in daily use, and it shaves about 30 seconds off
>>> the time you have to wait to get to the [Yes/No] prompt (from ~90s to
>>> 60s) on my old Sandy Bridge laptop when performing normal upgrades.
>>>
>>> Hopefully, at least some of these patches can be incorporated, and
>> please
>>> let me know if any changes are necessary.
>>>
>>> Thanks,
>>> Chun-Yu
>>
>> Using global variables for caches like these causes a form of memory
>> leak for use cases involving long-running processes that need to work
>> with many different repositories (and perhaps multiple versions of
>> those
>> repositories).
>>
>> There are at least a couple of different strategies that we can use to
>> avoid this form of memory leak:
>>
>> 1) Limit the scope of the caches so that they have some sort of garbage
>> collection life cycle. For example, it would be natural for the
>> depgraph
>> class to have a local cache of use_reduce results, so that the cache
>> can
>> be garbage collected along with the depgraph.
>>
>> 2) Eliminate redundant calls. For example, redundant calls to
>> catpkgslit
>> can be avoided by constructing more _pkg_str instances, since
>> catpkgsplit is able to return early when its argument happens to be a
>> _pkg_str instance.
>
> I think the weak stuff from the standard library might also be helpful.
>
> --
> Best regards,
> Michał Górny
>
Hmm, maybe weak global caches are an option?
--
Thanks,
Zac


signature.asc (1000 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Add caching to a few commonly used functions

Michał Górny-5
Dnia June 28, 2020 3:42:33 AM UTC, Zac Medico <[hidden email]> napisał(a):

>On 6/27/20 8:12 PM, Michał Górny wrote:
>> Dnia June 28, 2020 3:00:00 AM UTC, Zac Medico <[hidden email]>
>napisał(a):
>>> On 6/26/20 11:34 PM, Chun-Yu Shei wrote:
>>>> Hi,
>>>>
>>>> I was recently interested in whether portage could be speed up,
>since
>>>> dependency resolution can sometimes take a while on slower
>machines.
>>>> After generating some flame graphs with cProfile and vmprof, I
>found
>>> 3
>>>> functions which seem to be called extremely frequently with the
>same
>>>> arguments: catpkgsplit, use_reduce, and match_from_list.  In the
>>> first
>>>> two cases, it was simple to cache the results in dicts, while
>>>> match_from_list was a bit trickier, since it seems to be a
>>> requirement
>>>> that it return actual entries from the input "candidate_list".  I
>>> also
>>>> ran into some test failures if I did the caching after the
>>>> mydep.unevaluated_atom.use and mydep.repo checks towards the end of
>>> the
>>>> function, so the caching is only done up to just before that point.
>>>>
>>>> The catpkgsplit change seems to definitely be safe, and I'm pretty
>>> sure
>>>> the use_reduce one is too, since anything that could possibly
>change
>>> the
>>>> result is hashed.  I'm a bit less certain about the match_from_list
>>> one,
>>>> although all tests are passing.
>>>>
>>>> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
>>>> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup.
>>> "emerge
>>>> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22
>sec
>>>> (2.5% improvement).  Since the upgrade case is far more common,
>this
>>>> would really help in daily use, and it shaves about 30 seconds off
>>>> the time you have to wait to get to the [Yes/No] prompt (from ~90s
>to
>>>> 60s) on my old Sandy Bridge laptop when performing normal upgrades.
>>>>
>>>> Hopefully, at least some of these patches can be incorporated, and
>>> please
>>>> let me know if any changes are necessary.
>>>>
>>>> Thanks,
>>>> Chun-Yu
>>>
>>> Using global variables for caches like these causes a form of memory
>>> leak for use cases involving long-running processes that need to
>work
>>> with many different repositories (and perhaps multiple versions of
>>> those
>>> repositories).
>>>
>>> There are at least a couple of different strategies that we can use
>to
>>> avoid this form of memory leak:
>>>
>>> 1) Limit the scope of the caches so that they have some sort of
>garbage
>>> collection life cycle. For example, it would be natural for the
>>> depgraph
>>> class to have a local cache of use_reduce results, so that the cache
>>> can
>>> be garbage collected along with the depgraph.
>>>
>>> 2) Eliminate redundant calls. For example, redundant calls to
>>> catpkgslit
>>> can be avoided by constructing more _pkg_str instances, since
>>> catpkgsplit is able to return early when its argument happens to be
>a
>>> _pkg_str instance.
>>
>> I think the weak stuff from the standard library might also be
>helpful.
>>
>> --
>> Best regards,
>> Michał Górny
>>
>
>Hmm, maybe weak global caches are an option?

It would probably be necessary to add hit/miss counter and compare results before and after.

--
Best regards,
Michał Górny

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] Add caching to catpkgsplit function

Sid Spry
In reply to this post by Chun-Yu Shei-2
On Sat, Jun 27, 2020, at 1:34 AM, Chun-Yu Shei wrote:

> According to cProfile, catpkgsplit is called up to 1-5.5 million times
> during "emerge -uDvpU --with-bdeps=y @world". Adding a dict to cache its
> results reduces the time for this command from 43.53 -> 41.53 seconds --
> a 4.8% speedup.
> ---
>  lib/portage/versions.py | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/lib/portage/versions.py b/lib/portage/versions.py
> index 0c21373cc..ffec316ce 100644
> --- a/lib/portage/versions.py
> +++ b/lib/portage/versions.py
> @@ -312,6 +312,7 @@ def _pkgsplit(mypkg, eapi=None):
>  
>  _cat_re = re.compile('^%s$' % _cat, re.UNICODE)
>  _missing_cat = 'null'
> +_catpkgsplit_cache = {}
>  
>  def catpkgsplit(mydata, silent=1, eapi=None):
>   """
> @@ -331,6 +332,11 @@ def catpkgsplit(mydata, silent=1, eapi=None):
>   return mydata.cpv_split
>   except AttributeError:
>   pass
> +
> + cache_entry = _catpkgsplit_cache.get(mydata)
> + if cache_entry is not None:
> + return cache_entry
> +
>   mysplit = mydata.split('/', 1)
>   p_split = None
>   if len(mysplit) == 1:
> @@ -343,6 +349,7 @@ def catpkgsplit(mydata, silent=1, eapi=None):
>   if not p_split:
>   return None
>   retval = (cat, p_split[0], p_split[1], p_split[2])
> + _catpkgsplit_cache[mydata] = retval
>   return retval
>  
>  class _pkg_str(_unicode):
> --
> 2.27.0.212.ge8ba1cc988-goog
>

There are libraries that provide decorators, etc, for caching and memoization.
Have you evaluated any of those? One is available in the standard library:
https://docs.python.org/dev/library/functools.html#functools.lru_cache

I comment as this would increase code clarity.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] Add caching to catpkgsplit function

Francesco Riosa-3

Il 29/06/20 03:58, Sid Spry ha scritto:
> There are libraries that provide decorators, etc, for caching and memoization.
> Have you evaluated any of those? One is available in the standard library:
> https://docs.python.org/dev/library/functools.html#functools.lru_cache
>
> I comment as this would increase code clarity.
>
I think portage developers try hard to avoid external dependancies
I hope hard they do


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] Add caching to catpkgsplit function

Francesco Riosa-3
Il 06/07/20 17:50, Michael 'veremitz' Everitt ha scritto:

> On 06/07/20 16:26, Francesco Riosa wrote:
>> Il 29/06/20 03:58, Sid Spry ha scritto:
>>> There are libraries that provide decorators, etc, for caching and
>>> memoization.
>>> Have you evaluated any of those? One is available in the standard library:
>>> https://docs.python.org/dev/library/functools.html#functools.lru_cache
>>>
>>> I comment as this would increase code clarity.
>>>
>> I think portage developers try hard to avoid external dependancies
>> I hope hard they do
>>
>>
> I think the key word here is 'external' - anything which is part of the
> python standard library is game for inclusion in portage, and has/does
> provide much needed optimisation. Many of the issues in portage are
> so-called "solved problems" in computing terms, and as such, we should take
> advantage of these to improve performance at every available opportunity.
> Of course, there are presently only one, two or three key developers able
> to make/test these changes (indeed at scale) so progress is often slower
> than desirable in current circumstances...
>
> [sent direct due to posting restrictions...]
yes I've replied too fast and didn't notice Sid was referring to
_standard_ libraries (not even recent additions)

sorry for the noise

- Francesco


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] Add caching to catpkgsplit function

Chun-Yu Shei-2
I finally got a chance to try Sid's lru_cache suggestion, and the
results were really good.  Simply adding it on catpkgsplit and moving
the body of use_reduce into a separate function (that accepts tuples
instead of unhashable lists/sets) and decorating it with lru_cache
gets a similar 40% overall speedup for the upgrade case I tested.  It
seems like even a relatively small cache size (1000 entries) gives
quite a speedup, even though in the use_reduce case, the cache size
eventually reaches almost 20,000 entries if no limit is set.  With
these two changes, adding caching to match_from_list didn't seem to
make much/any difference.

The catch is that lru_cache is only available in Python 3.2, so would
it make sense to add a dummy lru_cache implementation for Python < 3.2
that does nothing?  There is also a backports-functools-lru-cache
package that's already available in the Portage tree, but that would
add an additional external dependency.

I agree that refactoring could yield an even bigger gain, but
hopefully this can be implemented as an interim solution to speed up
the common emerge case of resolving upgrades.  I'm happy to submit new
patches for this, if someone can suggest how to best handle the Python
< 3.2 case. :)

Thanks,
Chun-Yu


On Mon, Jul 6, 2020 at 9:10 AM Francesco Riosa <[hidden email]> wrote:

>
> Il 06/07/20 17:50, Michael 'veremitz' Everitt ha scritto:
> > On 06/07/20 16:26, Francesco Riosa wrote:
> >> Il 29/06/20 03:58, Sid Spry ha scritto:
> >>> There are libraries that provide decorators, etc, for caching and
> >>> memoization.
> >>> Have you evaluated any of those? One is available in the standard library:
> >>> https://docs.python.org/dev/library/functools.html#functools.lru_cache
> >>>
> >>> I comment as this would increase code clarity.
> >>>
> >> I think portage developers try hard to avoid external dependancies
> >> I hope hard they do
> >>
> >>
> > I think the key word here is 'external' - anything which is part of the
> > python standard library is game for inclusion in portage, and has/does
> > provide much needed optimisation. Many of the issues in portage are
> > so-called "solved problems" in computing terms, and as such, we should take
> > advantage of these to improve performance at every available opportunity.
> > Of course, there are presently only one, two or three key developers able
> > to make/test these changes (indeed at scale) so progress is often slower
> > than desirable in current circumstances...
> >
> > [sent direct due to posting restrictions...]
> yes I've replied too fast and didn't notice Sid was referring to
> _standard_ libraries (not even recent additions)
>
> sorry for the noise
>
> - Francesco
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] Add caching to catpkgsplit function

Zac Medico-2
On 7/6/20 10:30 AM, Chun-Yu Shei wrote:

> I finally got a chance to try Sid's lru_cache suggestion, and the
> results were really good.  Simply adding it on catpkgsplit and moving
> the body of use_reduce into a separate function (that accepts tuples
> instead of unhashable lists/sets) and decorating it with lru_cache
> gets a similar 40% overall speedup for the upgrade case I tested.  It
> seems like even a relatively small cache size (1000 entries) gives
> quite a speedup, even though in the use_reduce case, the cache size
> eventually reaches almost 20,000 entries if no limit is set.  With
> these two changes, adding caching to match_from_list didn't seem to
> make much/any difference.
That's great!

> The catch is that lru_cache is only available in Python 3.2, so would
> it make sense to add a dummy lru_cache implementation for Python < 3.2
> that does nothing?  There is also a backports-functools-lru-cache
> package that's already available in the Portage tree, but that would
> add an additional external dependency.
>
> I agree that refactoring could yield an even bigger gain, but
> hopefully this can be implemented as an interim solution to speed up
> the common emerge case of resolving upgrades.  I'm happy to submit new
> patches for this, if someone can suggest how to best handle the Python
> < 3.2 case. :)
>
> Thanks,
> Chun-Yu
We can safely drop support for < Python 3.6 at this point. Alternatively
we could add a compatibility shim for Python 2.7 that does not perform
any caching, but I really don't think it's worth the trouble to support
it any longer.
--
Thanks,
Zac


signature.asc (1000 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] Add caching to catpkgsplit function

Zac Medico-2
On 7/6/20 11:03 AM, Zac Medico wrote:

> On 7/6/20 10:30 AM, Chun-Yu Shei wrote:
>> I finally got a chance to try Sid's lru_cache suggestion, and the
>> results were really good.  Simply adding it on catpkgsplit and moving
>> the body of use_reduce into a separate function (that accepts tuples
>> instead of unhashable lists/sets) and decorating it with lru_cache
>> gets a similar 40% overall speedup for the upgrade case I tested.  It
>> seems like even a relatively small cache size (1000 entries) gives
>> quite a speedup, even though in the use_reduce case, the cache size
>> eventually reaches almost 20,000 entries if no limit is set.  With
>> these two changes, adding caching to match_from_list didn't seem to
>> make much/any difference.
>
> That's great!
>
>> The catch is that lru_cache is only available in Python 3.2, so would
>> it make sense to add a dummy lru_cache implementation for Python < 3.2
>> that does nothing?  There is also a backports-functools-lru-cache
>> package that's already available in the Portage tree, but that would
>> add an additional external dependency.
>>
>> I agree that refactoring could yield an even bigger gain, but
>> hopefully this can be implemented as an interim solution to speed up
>> the common emerge case of resolving upgrades.  I'm happy to submit new
>> patches for this, if someone can suggest how to best handle the Python
>> < 3.2 case. :)
>>
>> Thanks,
>> Chun-Yu
>
> We can safely drop support for < Python 3.6 at this point. Alternatively
> we could add a compatibility shim for Python 2.7 that does not perform
> any caching, but I really don't think it's worth the trouble to support
> it any longer.
We've dropped Python 2.7, so now the minimum version is Python 3.6.
--
Thanks,
Zac


signature.asc (1000 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/3] Add caching to catpkgsplit function

Chun-Yu Shei-2
Awesome!  Here's a patch that adds @lru_cache to use_reduce, vercmp, and
catpkgsplit.  use_reduce was split into 2 functions, with the outer one
converting lists/sets to tuples so they can be hashed and creating a
copy of the returned list (since the caller seems to modify it
sometimes).  I tried to select cache sizes that minimized memory use increase,
while still providing about the same speedup compared to a cache with
unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases
from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the
RES column in htop increases from 280 MB -> 290 MB.

"emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while
max observed RES value actually decreases from 228 MB -> 214 MB (similar
values observed across a few before/after runs).

Here are the cache hit stats, max observed RES memory, and runtime in
seconds  for various sizes in the update case.  Caching for each
function was tested independently (only 1 function with caching enabled
at a time):

catpkgsplit:
CacheInfo(hits=1222233, misses=21419, maxsize=None, currsize=21419)
270 MB
39.217

CacheInfo(hits=1218900, misses=24905, maxsize=10000, currsize=10000)
271 MB
39.112

CacheInfo(hits=1212675, misses=31022, maxsize=5000, currsize=5000)
271 MB
39.217

CacheInfo(hits=1207879, misses=35878, maxsize=2500, currsize=2500)
269 MB
39.438

CacheInfo(hits=1199402, misses=44250, maxsize=1000, currsize=1000)
271 MB
39.348

CacheInfo(hits=1149150, misses=94610, maxsize=100, currsize=100)
271 MB
39.487


use_reduce:
CacheInfo(hits=45326, misses=18660, maxsize=None, currsize=18561)
407 MB
35.77

CacheInfo(hits=45186, misses=18800, maxsize=10000, currsize=10000)
353 MB
35.52

CacheInfo(hits=44977, misses=19009, maxsize=5000, currsize=5000)
335 MB
35.31

CacheInfo(hits=44691, misses=19295, maxsize=2500, currsize=2500)
318 MB
35.85

CacheInfo(hits=44178, misses=19808, maxsize=1000, currsize=1000)
301 MB
36.39

CacheInfo(hits=41211, misses=22775, maxsize=100, currsize=100)
299 MB
37.175


I didn't bother collecting detailed stats for vercmp, since the
inputs/outputs are quite small and don't cause much memory increase.
Please let me know if there are any other suggestions/improvements (and
thanks Sid for the lru_cache suggestion!).

Thanks,
Chun-Yu



Reply | Threaded
Open this post in threaded view
|

[PATCH] Add caching to use_reduce, vercmp, and catpkgsplit

Chun-Yu Shei-2
Each of these functions is called repeatedly with the same arguments
many times. Cache sizes were selected to minimize memory use increase,
while still providing about the same speedup compared to a cache with
unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases
from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the
RES column in htop increases from 280 MB -> 290 MB.

"emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while
max observed RES value actually decreases from 228 MB -> 214 MB (similar
values observed across a few before/after runs).
---
 lib/portage/dep/__init__.py | 106 +++++++++++++++++++++---------------
 lib/portage/versions.py     |   3 +
 2 files changed, 66 insertions(+), 43 deletions(-)

diff --git a/lib/portage/dep/__init__.py b/lib/portage/dep/__init__.py
index 72988357a..4d91a411a 100644
--- a/lib/portage/dep/__init__.py
+++ b/lib/portage/dep/__init__.py
@@ -23,6 +23,7 @@ portage.proxy.lazyimport.lazyimport(globals(),
  'portage.util:cmp_sort_key,writemsg',
 )
 
+from functools import lru_cache
 from portage import _encodings, _unicode_decode, _unicode_encode
 from portage.eapi import _get_eapi_attrs
 from portage.exception import InvalidAtom, InvalidData, InvalidDependString
@@ -404,49 +405,9 @@ def paren_enclose(mylist, unevaluated_atom=False, opconvert=False):
  mystrparts.append(x)
  return " ".join(mystrparts)
 
-def use_reduce(depstr, uselist=(), masklist=(), matchall=False, excludeall=(), is_src_uri=False, \
- eapi=None, opconvert=False, flat=False, is_valid_flag=None, token_class=None, matchnone=False,
- subset=None):
- """
- Takes a dep string and reduces the use? conditionals out, leaving an array
- with subarrays. All redundant brackets are removed.
-
- @param depstr: depstring
- @type depstr: String
- @param uselist: Sequence of use enabled flags
- @type uselist: Sequence
- @param masklist: Sequence of masked flags (always treated as disabled)
- @type masklist: Sequence
- @param matchall: Treat all conditionals as active. Used by repoman.
- @type matchall: Bool
- @param excludeall: Sequence of flags for which negated conditionals are always treated as inactive.
- @type excludeall: Sequence
- @param is_src_uri: Indicates if depstr represents a SRC_URI
- @type is_src_uri: Bool
- @param eapi: Indicates the EAPI the dep string has to comply to
- @type eapi: String
- @param opconvert: Put every operator as first element into it's argument list
- @type opconvert: Bool
- @param flat: Create a flat list of all tokens
- @type flat: Bool
- @param is_valid_flag: Function that decides if a given use flag might be used in use conditionals
- @type is_valid_flag: Function
- @param token_class: Convert all non operator tokens into this class
- @type token_class: Class
- @param matchnone: Treat all conditionals as inactive. Used by digestgen().
- @type matchnone: Bool
- @param subset: Select a subset of dependencies conditional on the given flags
- @type subset: Sequence
- @rtype: List
- @return: The use reduced depend array
- """
- if isinstance(depstr, list):
- if portage._internal_caller:
- warnings.warn(_("Passing paren_reduced dep arrays to %s is deprecated. " + \
- "Pass the original dep string instead.") % \
- ('portage.dep.use_reduce',), DeprecationWarning, stacklevel=2)
- depstr = paren_enclose(depstr)
-
+@lru_cache(1024)
+def use_reduce_cached(depstr, uselist, masklist, matchall, excludeall, is_src_uri,  eapi, \
+ opconvert, flat, is_valid_flag, token_class, matchnone, subset):
  if opconvert and flat:
  raise ValueError("portage.dep.use_reduce: 'opconvert' and 'flat' are mutually exclusive")
 
@@ -769,6 +730,65 @@ def use_reduce(depstr, uselist=(), masklist=(), matchall=False, excludeall=(), i
 
  return stack[0]
 
+def use_reduce(depstr, uselist=(), masklist=(), matchall=False, excludeall=(), is_src_uri=False, \
+ eapi=None, opconvert=False, flat=False, is_valid_flag=None, token_class=None, matchnone=False,
+ subset=None):
+ """
+ Takes a dep string and reduces the use? conditionals out, leaving an array
+ with subarrays. All redundant brackets are removed.
+
+ @param depstr: depstring
+ @type depstr: String
+ @param uselist: Sequence of use enabled flags
+ @type uselist: Sequence
+ @param masklist: Sequence of masked flags (always treated as disabled)
+ @type masklist: Sequence
+ @param matchall: Treat all conditionals as active. Used by repoman.
+ @type matchall: Bool
+ @param excludeall: Sequence of flags for which negated conditionals are always treated as inactive.
+ @type excludeall: Sequence
+ @param is_src_uri: Indicates if depstr represents a SRC_URI
+ @type is_src_uri: Bool
+ @param eapi: Indicates the EAPI the dep string has to comply to
+ @type eapi: String
+ @param opconvert: Put every operator as first element into it's argument list
+ @type opconvert: Bool
+ @param flat: Create a flat list of all tokens
+ @type flat: Bool
+ @param is_valid_flag: Function that decides if a given use flag might be used in use conditionals
+ @type is_valid_flag: Function
+ @param token_class: Convert all non operator tokens into this class
+ @type token_class: Class
+ @param matchnone: Treat all conditionals as inactive. Used by digestgen().
+ @type matchnone: Bool
+ @param subset: Select a subset of dependencies conditional on the given flags
+ @type subset: Sequence
+ @rtype: List
+ @return: The use reduced depend array
+ """
+ if isinstance(depstr, list):
+ if portage._internal_caller:
+ warnings.warn(_("Passing paren_reduced dep arrays to %s is deprecated. " + \
+ "Pass the original dep string instead.") % \
+ ('portage.dep.use_reduce',), DeprecationWarning, stacklevel=2)
+ depstr = paren_enclose(depstr)
+
+ if uselist is not None:
+ uselist = tuple(uselist)
+ if masklist is not None:
+ masklist = tuple(masklist)
+ if excludeall is not None:
+ excludeall = tuple(excludeall)
+ if subset is not None:
+ subset = tuple(subset)
+
+ result = use_reduce_cached(depstr, uselist, masklist, matchall, excludeall, \
+ is_src_uri, eapi, opconvert, flat, is_valid_flag, token_class, \
+ matchnone, subset)
+
+ # The list returned by this function may be modified, so return a copy.
+ return result[:]
+
 def dep_opconvert(deplist):
  """
  Iterate recursively through a list of deps, if the
diff --git a/lib/portage/versions.py b/lib/portage/versions.py
index 0c21373cc..9ede67230 100644
--- a/lib/portage/versions.py
+++ b/lib/portage/versions.py
@@ -25,6 +25,7 @@ portage.proxy.lazyimport.lazyimport(globals(),
  'portage.repository.config:_gen_valid_repo',
  'portage.util:cmp_sort_key',
 )
+from functools import lru_cache
 from portage import _unicode_decode
 from portage.eapi import _get_eapi_attrs
 from portage.exception import InvalidData
@@ -116,6 +117,7 @@ def ververify(myver, silent=1):
  print(_("!!! syntax error in version: %s") % myver)
  return False
 
+@lru_cache(1024)
 def vercmp(ver1, ver2, silent=1):
  """
  Compare two versions
@@ -313,6 +315,7 @@ def _pkgsplit(mypkg, eapi=None):
 _cat_re = re.compile('^%s$' % _cat, re.UNICODE)
 _missing_cat = 'null'
 
+@lru_cache(10240)
 def catpkgsplit(mydata, silent=1, eapi=None):
  """
  Takes a Category/Package-Version-Rev and returns a list of each.
--
2.27.0.383.g050319c2ae-goog


12