[PATCH] ecompress: optimize docompress -x precompressed comparison

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] ecompress: optimize docompress -x precompressed comparison

Zac Medico-2
Use sort and comm with temporary files in order to compare lists
of docompress -x and precompressed files, since the file lists
can be extremely large. Also strip ${D%/} from paths in order to
reduce length.

Bug: https://bugs.gentoo.org/721516
Suggested-by: Robin H. Johnson <[hidden email]>
Signed-off-by: Zac Medico <[hidden email]>
---
 bin/ecompress                                 | 29 ++++++++++---------
 .../tests/resolver/ResolverPlayground.py      |  1 +
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/bin/ecompress b/bin/ecompress
index 60b083834..983a4d1f7 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -19,29 +19,30 @@ while [[ $# -gt 0 ]] ; do
  shift
 
  skip_dirs=()
- skip_files=()
+ > "${T}/.ecompress_skip_files" || die
  for skip; do
  if [[ -d ${ED%/}/${skip#/} ]]; then
  skip_dirs+=( "${ED%/}/${skip#/}" )
  else
  rm -f "${ED%/}/${skip#/}.ecompress" || die
- skip_files+=("${ED%/}/${skip#/}")
+ printf '%s\0' "${EPREFIX}/${skip#/}" >> "${T}/.ecompress_skip_files"
  fi
  done
 
  if [[ ${#skip_dirs[@]} -gt 0 ]]; then
- while read -r -d ''; do
- skip_files+=("${REPLY%.ecompress}")
+ while read -r -d '' skip; do
+ skip=${skip%.ecompress}
+ printf '%s\0' "${skip#${D%/}}" >> "${T}/.ecompress_skip_files"
  done < <(find "${skip_dirs[@]}" -name '*.ecompress' -print0 -delete || die)
  fi
 
- if [[ ${#skip_files[@]} -gt 0 && -s ${T}/.ecompress_had_precompressed ]]; then
- sed_args=()
- for f in "${skip_files[@]}"; do
- sed_args+=("s|^${f}\$||;")
- done
- sed_args+=('/^$/d')
- sed -f - -i "${T}/.ecompress_had_precompressed" <<< "${sed_args[@]}" || die
+ if [[ -s ${T}/.ecompress_skip_files && -s ${T}/.ecompress_had_precompressed ]]; then
+ # Filter skipped files from ${T}/.ecompress_had_precompressed,
+ # using temporary files since these lists can be extremely large.
+ LC_COLLATE=C sort -zu "${T}/.ecompress_skip_files" > "${T}/.ecompress_skip_files_sorted"|| die
+ LC_COLLATE=C sort -zu "${T}/.ecompress_had_precompressed" > "${T}/.ecompress_had_precompressed_sorted" || die
+ LC_COLLATE=C comm -z13 "${T}/.ecompress_skip_files_sorted" "${T}/.ecompress_had_precompressed_sorted" > "${T}/.ecompress_had_precompressed" || die
+ rm -f "${T}/.ecompress_had_precompressed_sorted" "${T}/.ecompress_skip_files"{,_sorted}
  fi
 
  exit 0
@@ -81,7 +82,7 @@ while [[ $# -gt 0 ]] ; do
  continue 2
  fi
  done
- echo "${path}" >> "${T}"/.ecompress_had_precompressed
+ printf '%s\0' "${path#${D%/}}" >> "${T}"/.ecompress_had_precompressed || die
  ;;
  esac
 
@@ -195,8 +196,8 @@ if [[ -s ${T}/.ecompress_had_precompressed ]]; then
  eqawarn "(manpages, documentation) when automatic compression is used:"
  eqawarn
  n=0
- while read -r f; do
- eqawarn "  ${f#${D%/}}"
+ while read -r -d '' f; do
+ eqawarn "  ${f}"
  if [[ $(( n++ )) -eq 10 ]]; then
  eqawarn "  ..."
  break
diff --git a/lib/portage/tests/resolver/ResolverPlayground.py b/lib/portage/tests/resolver/ResolverPlayground.py
index de80a0cc1..ec2e31ae9 100644
--- a/lib/portage/tests/resolver/ResolverPlayground.py
+++ b/lib/portage/tests/resolver/ResolverPlayground.py
@@ -91,6 +91,7 @@ class ResolverPlayground(object):
  "chgrp",
  "chmod",
  "chown",
+ "comm",
  "cp",
  "egrep",
  "env",
--
2.25.3


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] ecompress: optimize docompress -x precompressed comparison

Zac Medico-2
On 6/28/20 12:54 PM, Zac Medico wrote:
> + LC_COLLATE=C sort -zu "${T}/.ecompress_skip_files" > "${T}/.ecompress_skip_files_sorted"|| die
> + LC_COLLATE=C sort -zu "${T}/.ecompress_had_precompressed" > "${T}/.ecompress_had_precompressed_sorted" || die
> + LC_COLLATE=C comm -z13 "${T}/.ecompress_skip_files_sorted" "${T}/.ecompress_had_precompressed_sorted" > "${T}/.ecompress_had_precompressed" || die

I've updated my branch to use \n separators, since posix comm does not
support the -z option.
--
Thanks,
Zac


signature.asc (1000 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] ecompress: optimize docompress -x precompressed comparison

Robin H. Johnson-2
In reply to this post by Zac Medico-2
On Sun, Jun 28, 2020 at 12:54:56PM -0700, Zac Medico wrote:
> Use sort and comm with temporary files in order to compare lists
> of docompress -x and precompressed files, since the file lists
> can be extremely large. Also strip ${D%/} from paths in order to
> reduce length.
+1 looks much better.

--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : [hidden email]
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

signature.asc (1K) Download Attachment