OT scripting - strip zero if between period and digit

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

OT scripting - strip zero if between period and digit

Adam Carter
I need to clean up a file which has IP addresses with leading zeros in some of the octets so I need to make, say, .09 into .9

How do i do that in sed/awk/whatever?
Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Michael Orlitzky
On 1/21/19 6:50 PM, Adam Carter wrote:
> I need to clean up a file which has IP addresses with leading zeros in
> some of the octets so I need to make, say, .09 into .9
>
> How do i do that in sed/awk/whatever?

The first thing you should do is construct a bunch of test cases, with
all of the possible input representations and what you think the output
representation should be. Then, you should write a program in something
other than bash that passes all of the test cases. It's not as easy as
it sounds; for example:

   * What happens to 0.1.2.3?

   * What happens to 01.2.3.4?

   * What happens to 1.2.3.0?

   * What happens to 1.2.000.3?

You need a parser, not a regular expression. (You can do it with a
regex, but it's going to be one of those comical twelve-page-long things.)

Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Grant Taylor-2
On 1/21/19 5:02 PM, Michael Orlitzky wrote:
> You need a parser, not a regular expression.

The first thing that came to mind is splitting the values and passing
them through printf.

> (You can do it with a regex, but it's going to be one of those comical
> twelve-page-long things.)

I don't know about 12 pages.  But, yes, a regular expression that takes
all the possible cases into account, especially as the four octet IP,
will be … complicated.  A regular expression to work on an individual
octet might be less complicated.

You can play with REs fairly easily via sed.

Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Andrew Udvare
In reply to this post by Adam Carter
On 21/01/2019 18:50, Adam Carter wrote:
> I need to clean up a file which has IP addresses with leading zeros in
> some of the octets so I need to make, say, .09 into .9
>
> How do i do that in sed/awk/whatever?

A regex would be difficult. Parser is what you want.

You could use Python's ipaddress module (Python 3.3+). It will fix your
IPs (below is all one line):

python -c $'import ipaddress, sys;\nfor x in sys.argv[1:]:
print(ipaddress.ip_address(x))' 1.02.3.4 001.002.003.004

Output:
1.2.3.4
1.2.3.4

Fix that for stdin:

python -c $'import ipaddress, sys;\nfor x in sys.stdin.readlines():
print(ipaddress.ip_address(x.strip()))' <<< $'1.02.3.4\n001.002.003.004'

That way you can do:

python -c $'import ipaddress, sys;\nfor x in sys.stdin.readlines():
print(ipaddress.ip_address(x.strip()))' < list-of-ip-addresses

I'm sure there's a nicer way with modules installed with other languages
but this is built into Python as of version 3.3.

Andrew


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

François-Xavier CARTON
In reply to this post by Adam Carter
Le 22/01/2019 à 00:50, Adam Carter a écrit :
> I need to clean up a file which has IP addresses with leading zeros in
> some of the octets so I need to make, say, .09 into .9
>
> How do i do that in sed/awk/whatever?


I believe that should do:

sed 's/0*\([0-9]\)/\1/g'

eg.

$ sed 's/0*\([0-9]\)/\1/g' <<EOF
0.1.2.3
01.2.3.4
1.2.3.0
1.2.000.3
EOF
0.1.2.3
1.2.3.4
1.2.3.0
1.2.0.3

François-Xavier

Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

François-Xavier CARTON
Le 22/01/2019 à 03:05, François-Xavier CARTON a écrit :

> Le 22/01/2019 à 00:50, Adam Carter a écrit :
>> I need to clean up a file which has IP addresses with leading zeros in
>> some of the octets so I need to make, say, .09 into .9
>>
>> How do i do that in sed/awk/whatever?
>
>
> I believe that should do:
>
> sed 's/0*\([0-9]\)/\1/g'
>
> eg.
>
> $ sed 's/0*\([0-9]\)/\1/g' <<EOF
> 0.1.2.3
> 01.2.3.4
> 1.2.3.0
> 1.2.000.3
> EOF
> 0.1.2.3
> 1.2.3.4
> 1.2.3.0
> 1.2.0.3
>
> François-Xavier
>
>

My bad, it should be:

sed 's/0*\([0-9][0-9]*\)/\1/g'

(tests are indeed needed!)

François-Xavier

Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

David Haller-5
In reply to this post by Michael Orlitzky
Hello,

On Mon, 21 Jan 2019, Michael Orlitzky wrote:

>On 1/21/19 6:50 PM, Adam Carter wrote:
>> I need to clean up a file which has IP addresses with leading zeros in
>> some of the octets so I need to make, say, .09 into .9
>>
>> How do i do that in sed/awk/whatever?
>
>The first thing you should do is construct a bunch of test cases, with all of
>the possible input representations and what you think the output
>representation should be. Then, you should write a program in something other
>than bash that passes all of the test cases. It's not as easy as it sounds;
>for example:
>
>  * What happens to 0.1.2.3?
>
>  * What happens to 01.2.3.4?
>
>  * What happens to 1.2.3.0?
>
>  * What happens to 1.2.000.3?
>
>You need a parser, not a regular expression. (You can do it with a regex, but
>it's going to be one of those comical twelve-page-long things.)

$ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \
    sed 's/0*\([[:digit:]]\+\)/\1/g'
0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3

HTH,
-dnh

--
printk(KERN_DEBUG "adintr: Why?\n");
        linux-2.6.19/sound/oss/ad1848.c

Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Michael Orlitzky
On 1/21/19 9:55 PM, David Haller wrote:
>
> $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \
>      sed 's/0*\([[:digit:]]\+\)/\1/g'
> 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3
>

There are actually more than four examples that it needs to work on. And
more to the point, this is going to destroy any other numbers it finds
in the input. Phone numbers, zip codes, addresses, credit cards numbers,
timestamps, etc. will all get clobbered. It takes like 10 lines of
python to do this right; it's silly to invest a ton of effort trying to
come up with a regex solution that accidentally works.

Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Adam Carter
In reply to this post by François-Xavier CARTON
> François-Xavier
>
>

My bad, it should be:

sed 's/0*\([0-9][0-9]*\)/\1/g'

(tests are indeed needed!)
 
Many thanks François. This is almost right, but it is also stripping zeros that follow a letter, and I only want it to strip zeros that are proceeded by a period. There are no leading zeros in the first octet of the IP so that case does not need to be handled.

Does the \1 refer to what's in the ()'s? So anything that one would wont to carry through should be inside the ()'s and anything that's outside is stripped, right?



Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Adam Carter
In reply to this post by Michael Orlitzky
On Wed, Jan 23, 2019 at 12:34 AM Michael Orlitzky <[hidden email]> wrote:
On 1/21/19 9:55 PM, David Haller wrote:
>
> $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \
>      sed 's/0*\([[:digit:]]\+\)/\1/g'
> 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3
>

There are actually more than four examples that it needs to work on. And
more to the point, this is going to destroy any other numbers it finds
in the input. Phone numbers, zip codes, addresses, credit cards numbers,
timestamps, etc. will all get clobbered. It takes like 10 lines of
python to do this right; it's silly to invest a ton of effort trying to
come up with a regex solution that accidentally works.


Thanks Michael. The input data is constrained in ways I didnt list, so it might be possible to get away with a regex, but I appreciate you highlighting the risk of what sounds like a brittle approach.

I am hopeful that one day learning python will make it to the top of my priority list.
Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Adam Carter
In reply to this post by David Haller-5
$ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \
    sed 's/0*\([[:digit:]]\+\)/\1/g'
0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3



Hi David - thanks for that.

So [[:digit:]] is another way of writing [0-9] and the + just means another instance of the proceeding expression, right, so your and Francois solutions are functionally the same, and all the following are the same too, right?

[[:digit:]]+
[[:digit:]][[:digit:]]
[0-9]+
[0-9][0-9]
Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Paul Colquhoun
On Wednesday, 23 January 2019 2:32:43 PM AEDT Adam Carter wrote:

> > $ printf '0.1.2.3 01.2.3.4 1.2.3.0 1.2.000.3\n' | \
> >
> >     sed 's/0*\([[:digit:]]\+\)/\1/g'
> >
> > 0.1.2.3 1.2.3.4 1.2.3.0 1.2.0.3
>
> Hi David - thanks for that.
>
> So [[:digit:]] is another way of writing [0-9] and the + just means another
> instance of the proceeding expression, right, so your and Francois
> solutions are functionally the same, and all the following are the same
> too, right?
>
> [[:digit:]]+
> [[:digit:]][[:digit:]]
> [0-9]+
> [0-9][0-9]


Not quite.

A trailing '+' means "1 or more of the preceding item", while a trailing '*'
means "0 or more".

[0-9]+   would match any string consisting of only digits, no matter how long,
but not an empty string.


--
Reverend Paul Colquhoun, ULC.     http://andor.dropbear.id.au/
  Asking for technical help in newsgroups?  Read this first:
     http://catb.org/~esr/faqs/smart-questions.html#intro




Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

François-Xavier CARTON
In reply to this post by Adam Carter
Le 23/01/2019 à 04:19, Adam Carter a écrit :

>      > François-Xavier
>      >
>      >
>
>     My bad, it should be:
>
>     sed 's/0*\([0-9][0-9]*\)/\1/g'
>
>     (tests are indeed needed!)
>
>
> Many thanks François. This is almost right, but it is also stripping
> zeros that follow a letter, and I only want it to strip zeros that are
> proceeded by a period. There are no leading zeros in the first octet of
> the IP so that case does not need to be handled.
>
> Does the \1 refer to what's in the ()'s? So anything that one would wont
> to carry through should be inside the ()'s and anything that's outside
> is stripped, right?
>
>
>

Yes, \1 is the content in (). But adding letters inside won't solve the
problem, eg. "a01" will still be changed to "a1".

AFAIK, there is no way to express "start of line or a character" in sed,
but you could do two regexps, one starting with ^ (start of line), the
other with \. (dot)


sed 's/^0*\([0-9][0-9]*\)/\1/g;s/\.0*\([0-9][0-9]*\)/.\1/g'

Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Alexander Kapshuk
In reply to this post by Adam Carter
On Wed, Jan 23, 2019 at 5:20 AM Adam Carter <[hidden email]> wrote:

>>
>> > François-Xavier
>> >
>> >
>>
>> My bad, it should be:
>>
>> sed 's/0*\([0-9][0-9]*\)/\1/g'
>>
>> (tests are indeed needed!)
>
>
> Many thanks François. This is almost right, but it is also stripping zeros that follow a letter, and I only want it to strip zeros that are proceeded by a period. There are no leading zeros in the first octet of the IP so that case does not need to be handled.
>
> Does the \1 refer to what's in the ()'s? So anything that one would wont to carry through should be inside the ()'s and anything that's outside is stripped, right?
>
>
>

Would something like to do the trick?
echo 198.088.062.01 | sed 's/\.0/./g'
198.88.62.1

Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Paul Colquhoun
On Wednesday, 23 January 2019 5:52:57 PM AEDT Alexander Kapshuk wrote:

> On Wed, Jan 23, 2019 at 5:20 AM Adam Carter <[hidden email]> wrote:
> >> > François-Xavier
> >>
> >> My bad, it should be:
> >>
> >> sed 's/0*\([0-9][0-9]*\)/\1/g'
> >>
> >> (tests are indeed needed!)
> >
> > Many thanks François. This is almost right, but it is also stripping zeros
> > that follow a letter, and I only want it to strip zeros that are
> > proceeded by a period. There are no leading zeros in the first octet of
> > the IP so that case does not need to be handled.
> >
> > Does the \1 refer to what's in the ()'s? So anything that one would wont
> > to carry through should be inside the ()'s and anything that's outside is
> > stripped, right?
> Would something like to do the trick?
> echo 198.088.062.01 | sed 's/\.0/./g'
> 198.88.62.1

In a word, no.

echo 198.088.0.01 | sed 's/\.0/./g'
198.88..1


--
Reverend Paul Colquhoun, ULC.     http://andor.dropbear.id.au/
  Asking for technical help in newsgroups?  Read this first:
     http://catb.org/~esr/faqs/smart-questions.html#intro




Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Alexander Kapshuk
On Wed, Jan 23, 2019 at 9:05 AM Paul Colquhoun
<[hidden email]> wrote:

>
> On Wednesday, 23 January 2019 5:52:57 PM AEDT Alexander Kapshuk wrote:
> > On Wed, Jan 23, 2019 at 5:20 AM Adam Carter <[hidden email]> wrote:
> > >> > François-Xavier
> > >>
> > >> My bad, it should be:
> > >>
> > >> sed 's/0*\([0-9][0-9]*\)/\1/g'
> > >>
> > >> (tests are indeed needed!)
> > >
> > > Many thanks François. This is almost right, but it is also stripping zeros
> > > that follow a letter, and I only want it to strip zeros that are
> > > proceeded by a period. There are no leading zeros in the first octet of
> > > the IP so that case does not need to be handled.
> > >
> > > Does the \1 refer to what's in the ()'s? So anything that one would wont
> > > to carry through should be inside the ()'s and anything that's outside is
> > > stripped, right?
> > Would something like to do the trick?
> > echo 198.088.062.01 | sed 's/\.0/./g'
> > 198.88.62.1
>
> In a word, no.
>
> echo 198.088.0.01 | sed 's/\.0/./g'
> 198.88..1
>
>
> --
> Reverend Paul Colquhoun, ULC.     http://andor.dropbear.id.au/
>   Asking for technical help in newsgroups?  Read this first:
>      http://catb.org/~esr/faqs/smart-questions.html#intro
>
>
>
>

How about this one?

echo '198.088.0.01
198.088.062.01' | sed 's/\.0\([0-9][0-9]*\)/.\1/g'
198.88.0.1
198.88.62.1

Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Wols Lists
On 23/01/19 07:37, Alexander Kapshuk wrote:

> On Wed, Jan 23, 2019 at 9:05 AM Paul Colquhoun
> <[hidden email]> wrote:
>>
>> On Wednesday, 23 January 2019 5:52:57 PM AEDT Alexander Kapshuk wrote:
>>> On Wed, Jan 23, 2019 at 5:20 AM Adam Carter <[hidden email]> wrote:
>>>>>> François-Xavier
>>>>>
>>>>> My bad, it should be:
>>>>>
>>>>> sed 's/0*\([0-9][0-9]*\)/\1/g'
>>>>>
>>>>> (tests are indeed needed!)
>>>>
>>>> Many thanks François. This is almost right, but it is also stripping zeros
>>>> that follow a letter, and I only want it to strip zeros that are
>>>> proceeded by a period. There are no leading zeros in the first octet of
>>>> the IP so that case does not need to be handled.
>>>>
>>>> Does the \1 refer to what's in the ()'s? So anything that one would wont
>>>> to carry through should be inside the ()'s and anything that's outside is
>>>> stripped, right?
>>> Would something like to do the trick?
>>> echo 198.088.062.01 | sed 's/\.0/./g'
>>> 198.88.62.1
>>
>> In a word, no.
>>
>> echo 198.088.0.01 | sed 's/\.0/./g'
>> 198.88..1
>>
>>
>> --
>> Reverend Paul Colquhoun, ULC.     http://andor.dropbear.id.au/
>>   Asking for technical help in newsgroups?  Read this first:
>>      http://catb.org/~esr/faqs/smart-questions.html#intro
>>
>>
>>
>>
>
> How about this one?
>
> echo '198.088.0.01
> 198.088.062.01' | sed 's/\.0\([0-9][0-9]*\)/.\1/g'
> 198.88.0.1
> 198.88.62.1
>

I've just done a bit of digging, and would this work to match an octet?

[0-9][0-9]?[0-9]?

I know ? normally matches a single character, but apparently in this
syntax it means "0 or 1 occurrence of the preceding expression". So that
will detect a number consisting of at most three digits.

I thought there must be a "detect a single optional character" operator
... :-)

Cheers,
Wol

Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Michael Orlitzky
On 1/23/19 5:52 AM, Wols Lists wrote:
>
> I've just done a bit of digging, and would this work to match an octet?
>
> [0-9][0-9]?[0-9]?
>

It doesn't match 0123. Regardless, using [0-9] is destined to fail
because it will match things like 999 that also aren't an octet.

Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Grant Edwards-6
In reply to this post by Alexander Kapshuk
On 2019-01-23, Alexander Kapshuk <[hidden email]> wrote:
>
> How about this one?
>
> echo '198.088.0.01
> 198.088.062.01' | sed 's/\.0\([0-9][0-9]*\)/.\1/g'
> 198.88.0.1
> 198.88.62.1

Also no.

$ echo 198.088.0.001 |   sed 's/\.0\([0-9][0-9]*\)/.\1/g'
198.88.0.01


--
Grant Edwards               grant.b.edwards        Yow! Hello.  Just walk
                                  at               along and try NOT to think
                              gmail.com            about your INTESTINES being
                                                   almost FORTY YARDS LONG!!


Reply | Threaded
Open this post in threaded view
|

Re: OT scripting - strip zero if between period and digit

Neil Bothwick
On Wed, 23 Jan 2019 14:09:45 -0000 (UTC), Grant Edwards wrote:

> > How about this one?
> >
> > echo '198.088.0.01
> > 198.088.062.01' | sed 's/\.0\([0-9][0-9]*\)/.\1/g'
> > 198.88.0.1
> > 198.88.62.1  
>
> Also no.
>
> $ echo 198.088.0.001 |   sed 's/\.0\([0-9][0-9]*\)/.\1/g'
> 198.88.0.01
This is like playing Whack-a-Mole with sed ;-)


--
Neil Bothwick

I know corn oil comes from corn, where does baby oil come from?

attachment0 (849 bytes) Download Attachment
12