Need help with narroely focused use case of Emacs

Discussion:

Need help with narroely focused use case of Emacs

(too old to reply)

Richard Owlett

2024-06-28 19:10:02 UTC

Pluma is my editor of choice.
*BUT* it can NOT handle Search and Replace operations involving regular
expressions.

Emacs can. It has much verbose documentation.
But examples seem rather scarce.

I need to replace ANY occurrence of
<span class="verse" id="V1">
thru [at most]
<span class="verse" id="V119">
by
<sup>

I'm reformatting a Bible stored in HTML format for a particular set of
vision impaired seniors (myself included). Each chapter is in its own file.

How do I open a file.
Do the above replacement.
Save and close the file.

Help please.
TIA

didier gaumet

2024-06-28 19:20:01 UTC

Post by Richard Owlett
Pluma is my editor of choice.
*BUT* it can NOT handle Search and Replace operations involving regular
expressions.

[...]

Hello Richard,

According to the Mate wiki, Pluma handles regular expressions the Perl way:
https://wiki.mate-desktop.org/mate-desktop/applications/pluma/
https://perldoc.perl.org/perlre

t***@tuxteam.de

2024-06-29 05:00:01 UTC

Post by didier gaumet

Post by Richard Owlett
Pluma is my editor of choice.
*BUT* it can NOT handle Search and Replace operations involving regular
expressions.

[...]
Hello Richard,
https://wiki.mate-desktop.org/mate-desktop/applications/pluma/
https://perldoc.perl.org/perlre

See? I was sure of that. And Perl style regexps are actually somewhat
friendlier than Emacs style (they're roughly one decennium younger).

Thanks, Didier :-)

Cheers

--
t

Richard Owlett

2024-06-29 10:10:01 UTC

Post by didier gaumet

Post by Richard Owlett
Pluma is my editor of choice.
*BUT* it can NOT handle Search and Replace operations involving
regular expressions.

[...]
Hello Richard,
https://wiki.mate-desktop.org/mate-desktop/applications/pluma/

Hadn't seen that page. I based my opinion on what I saw when doing a
Search and Replace. Also Pluma's Help function doesn't mention it.

Post by didier gaumet
https://perldoc.perl.org/perlre

That page is thin on examples. But now knowing that Pluma does things
"the Perl way" I can do a web search.

Thank you.

Michael Kjörling

2024-06-28 21:00:01 UTC

Post by Richard Owlett
I need to replace ANY occurrence of
<span class="verse" id="V1">
thru [at most]
<span class="verse" id="V119">
by
<sup>
I'm reformatting a Bible stored in HTML format for a particular set of
vision impaired seniors (myself included). Each chapter is in its own file.
How do I open a file.
Do the above replacement.
Save and close the file.

Ignoring the question about Emacs and focusing on the goal (your
question otherwise is an excellent example of a XY question), this is
not something regular expressions are very good at. However, since
it's presumably a once-only operation, I assume that you can live with
it being done in a suboptimal way in terms of performance.

In that case, assuming for simplicity that all the files are in a
single directory, you could try something similar to:

$ for v in $(seq 1 119); do sed -i 's,<span class="verse" id="V'$v'">,<sup>,g' ./*.html; done

Be sure to have a copy in case something goes wrong; and diff(1) a few
files afterwards to make sure that the result is as you intended.

Yes, it almost certainly can be done with a single sed (or other
similar tool) invocation where the regular expression matches
precisely what you want it to match. But unless this is something you
will do very often, I tend to prefer readability over being clever,
even if the readable version is somewhat less performant.

--
Michael Kjörling 🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”

Charles Curley

2024-06-29 03:30:01 UTC

On Fri, 28 Jun 2024 20:53:50 +0000

Post by Michael KjÃ¶rling
$ for v in $(seq 1 119); do sed -i 's,<span class="verse"
id="V'$v'">,<sup>,g' ./*.html; done
Be sure to have a copy in case something goes wrong; and diff(1) a few
files afterwards to make sure that the result is as you intended.

Having done that (or similar), don't forget to change the relevant
</span> closing tags to </sup> closing tags. However, there may be
other </span> closing tags you don't want to change because they close
other <span> tags we haven't seen. So you may prefer to use regexes as
Murphy intended, handling both the opening and closing tags at the same
time, leaving the intervening text intact.

--
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/

Richard Owlett

2024-06-29 11:20:14 UTC

Post by Charles Curley
On Fri, 28 Jun 2024 20:53:50 +0000

Post by Michael KjÃ¶rling
$ for v in $(seq 1 119); do sed -i 's,<span class="verse"
id="V'$v'">,<sup>,g' ./*.html; done
Be sure to have a copy in case something goes wrong; and diff(1) a few
files afterwards to make sure that the result is as you intended.

Having done that (or similar), don't forget to change the relevant
</span> closing tags to </sup> closing tags. However, there may be
other </span> closing tags you don't want to change because they close
other <span> tags we haven't seen.

Chuckle ;} The appropriate "</span>" to be replaced by "</sup>" is
ALWAYS preceded by "#160;" .

Post by Charles Curley
So you may prefer to use regexes as
Murphy intended, handling both the opening and closing tags at the same
time, leaving the intervening text intact.

In this particular case I suspect it would become overly complex.
I've already discovered that the order of edits is important.

Michael Kjörling

2024-06-29 13:50:02 UTC

Post by Charles Curley

Post by Michael KjÃ¶rling
$ for v in $(seq 1 119); do sed -i 's,<span class="verse" id="V'$v'">,<sup>,g' ./*.html; done

Having done that (or similar), don't forget to change the relevant
</span> closing tags to </sup> closing tags. However, there may be
other </span> closing tags you don't want to change because they close
other <span> tags we haven't seen.

Chuckle ;} The appropriate "</span>" to be replaced by "</sup>" is ALWAYS
preceded by "#160;" .

As far as I can see, neither of this was stated in the original
question. Please don't add arbitrary requirements later to invalidate
potential answers.

--
Michael Kjörling 🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”

Andy Smith

2024-06-29 14:20:01 UTC

Hello,

Post by Michael KjÃ¶rling

there may be other </span> closing tags you don't want to
change because they close other <span> tags we haven't seen.

Chuckle ;} The appropriate "</span>" to be replaced by "</sup>" is ALWAYS
preceded by "#160;" .

As far as I can see, neither of this was stated in the original
question. Please don't add arbitrary requirements later to invalidate
potential answers.

It's not an authentic Owlett thread unless it contains an enormous
XY problem, a monomaniacal obsession with a solution already
part-dreamed up by the OP, several factual errors, and a constant
trickle of confounding small details that were never provided up
front, now delivered with glee.

Otherwise it's just sparkling timewasting.

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting

Lee

2024-06-29 16:10:02 UTC

Hi,

Post by Richard Owlett

Post by Charles Curley
So you may prefer to use regexes as
Murphy intended, handling both the opening and closing tags at the same
time, leaving the intervening text intact.

In this particular case I suspect it would become overly complex.
I've already discovered that the order of edits is important.

I guess it depends on what you're used to. I don't think this bit is
overly complex .. your opinion might be different

$ cat /tmp/z
cat /dev/null > txtfile.html
for v in $(seq 1 12); do echo '<span class="verse" id="V'$v'"> text
text text </span>' >> txtfile.html; done
sed -Ei.bak 's@<span class="verse"
id="V[[:digit:]]+">([^<]*)</span>@<sup>\1</sup>@g' txtfile.html

$ bash z

$ cat txtfile*
<sup> text text text </sup>
<sup> text text text </sup>
<sup> text text text </sup>
<sup> text text text </sup>
<sup> text text text </sup>
<sup> text text text </sup>
<sup> text text text </sup>
<sup> text text text </sup>
<sup> text text text </sup>
<sup> text text text </sup>
<sup> text text text </sup>
<sup> text text text </sup>
<span class="verse" id="V1"> text text text </span>
<span class="verse" id="V2"> text text text </span>
<span class="verse" id="V3"> text text text </span>
<span class="verse" id="V4"> text text text </span>
<span class="verse" id="V5"> text text text </span>
<span class="verse" id="V6"> text text text </span>
<span class="verse" id="V7"> text text text </span>
<span class="verse" id="V8"> text text text </span>
<span class="verse" id="V9"> text text text </span>
<span class="verse" id="V10"> text text text </span>
<span class="verse" id="V11"> text text text </span>
<span class="verse" id="V12"> text text text </span>

$

Regards,
Lee

Greg Wooledge

2024-06-29 12:50:01 UTC

Post by Charles Curley
On Fri, 28 Jun 2024 20:53:50 +0000

Post by Michael KjÃ¶rling
$ for v in $(seq 1 119); do sed -i 's,<span class="verse"
id="V'$v'">,<sup>,g' ./*.html; done
Be sure to have a copy in case something goes wrong; and diff(1) a few
files afterwards to make sure that the result is as you intended.

Having done that (or similar), don't forget to change the relevant
</span> closing tags to </sup> closing tags. However, there may be
other </span> closing tags you don't want to change because they close
other <span> tags we haven't seen. So you may prefer to use regexes as
Murphy intended, handling both the opening and closing tags at the same
time, leaving the intervening text intact.

https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454

Richard Owlett

2024-06-29 11:00:05 UTC

Post by Michael KjÃ¶rling

Post by Richard Owlett
I need to replace ANY occurrence of
<span class="verse" id="V1">
thru [at most]
<span class="verse" id="V119">
by
<sup>
I'm reformatting a Bible stored in HTML format for a particular set of
vision impaired seniors (myself included). Each chapter is in its own file.
How do I open a file.
Do the above replacement.
Save and close the file.

Ignoring the question about Emacs

Emacs *CAN NOT* be ignored.
It is the _available_ editor known to be capable of handling regular
expressions.

Post by Michael KjÃ¶rling
and focusing on the goal (your
question otherwise is an excellent example of a XY question), this is
not something regular expressions are very good at.

HUH ??????????

Post by Michael KjÃ¶rling
However, since
it's presumably a once-only operation, I assume that you can live with
it being done in a suboptimal way in terms of performance.
In that case, assuming for simplicity that all the files are in a
$ for v in $(seq 1 119); do sed -i 's,<span class="verse" id="V'$v'">,<sup>,g' ./*.html; done

I'll have to investigate sed further.
My project is not yet to the point of automatically editing ALL
chapters. I need to first establish how to edit all VERSES of an
individual chapter.

Post by Michael KjÃ¶rling
Be sure to have a copy in case something goes wrong; and diff(1) a few
files afterwards to make sure that the result is as you intended.

ROFL ;} No one would define me as a "programmer". I took an introduction
to computers course as a E.E. student in the 60's. Most of my jobs
required background in component level analog electronics. Got one
assignment because I was not "afraid" of 8080 ;}

Post by Michael KjÃ¶rling
Yes, it almost certainly can be done with a single sed (or other
similar tool) invocation where the regular expression matches
precisely what you want it to match. But unless this is something you
will do very often, I tend to prefer readability over being clever,
even if the readable version is somewhat less performant.

d***@howorth.org.uk

2024-06-29 12:00:01 UTC

Post by Richard Owlett

Post by Michael KjÃ¶rling

Post by Richard Owlett
I need to replace ANY occurrence of
<span class="verse" id="V1">
thru [at most]
<span class="verse" id="V119">
by
<sup>
I'm reformatting a Bible stored in HTML format for a particular
set of vision impaired seniors (myself included). Each chapter is
in its own file.
How do I open a file.
Do the above replacement.
Save and close the file.

Ignoring the question about Emacs

Emacs *CAN NOT* be ignored.
It is the _available_ editor known to be capable of handling regular
expressions.

Err, pluma is available I believe. I've never used it but I just
started it and used the Replace... entry on the Search menu to bring up
a dialog box. In the dialog box there is a tick box labelled "Match
regular expression". So I ticked that and then tested it by editing an
html file using an RE.

So Pluma is an "_available_ editor known to be capable of handling
regular expressions."

And as others have pointed out, sed is available and it's easy to
install others. So there are many possible answers to your question
other than emacs.

Richard Owlett

2024-06-29 13:00:01 UTC

Post by d***@howorth.org.uk

Post by Richard Owlett

Post by Michael KjÃ¶rling

Post by Richard Owlett
I need to replace ANY occurrence of
<span class="verse" id="V1">
thru [at most]
<span class="verse" id="V119">
by
<sup>
I'm reformatting a Bible stored in HTML format for a particular
set of vision impaired seniors (myself included). Each chapter is
in its own file.
How do I open a file.
Do the above replacement.
Save and close the file.

Ignoring the question about Emacs

Emacs *CAN NOT* be ignored.
It is the _available_ editor known to be capable of handling regular
expressions.

Err, pluma is available I believe.

May I quote my original post?

Post by d***@howorth.org.uk

Post by Richard Owlett
Pluma is my editor of choice.

I've never used it but I just
started it and used the Replace... entry on the Search menu to bring up
a dialog box. In the dialog box there is a tick box labelled "Match
regular expression". So I ticked that and then tested it by editing an
html file using an RE.
So Pluma is an "_available_ editor known to be capable of handling
regular expressions."

So you evidently have a later version than I have available for this
particular machine.
One does get latest and greatest by simply wishing for it.

Post by d***@howorth.org.uk
And as others have pointed out, sed is available and it's easy to
install others. So there are many possible answers to your question
other than emacs.

My definition of "available" includes knowledge of how to use it.
I've investigated it for some past projects and found easier way to
accomplish those particular tasks. Part of my interest in Emacs stems
from having seen what co-workers could do with its predecessor TECO
decades ago.

Updating MY system is NONtrivial!

Dan Ritter

2024-06-29 12:10:02 UTC

Post by Richard Owlett

Post by Michael KjÃ¶rling

Post by Richard Owlett
I need to replace ANY occurrence of
<span class="verse" id="V1">
thru [at most]
<span class="verse" id="V119">
by
<sup>
I'm reformatting a Bible stored in HTML format for a particular set of
vision impaired seniors (myself included). Each chapter is in its own file.
How do I open a file.
Do the above replacement.
Save and close the file.

Ignoring the question about Emacs

Emacs *CAN NOT* be ignored.
It is the _available_ editor known to be capable of handling regular
expressions.

If your machine doesn't have sed, it is not a working Debian
system.

Every Debian machine comes with sed by default. Even the
rescue image has sed. The installer environment, before Debian
is actually installed, has sed. sed is a basic tool that
everyone has access to. emacs needs to be installed, and often
is not.

I know from past experience that it's useless to offer you any
solution that deviates from the vision you have for the way the
world ought to work, but this is a sufficiently common kind of
problem that a full answer will be useful to other people.

Post by Richard Owlett

Post by Michael KjÃ¶rling
and focusing on the goal (your
question otherwise is an excellent example of a XY question), this is
not something regular expressions are very good at.

HUH ??????????

An XY question is when someone asks "How can I do specific thing
X?" but what they want to do is task Y, which is more easily
accomplished in a different way that doesn't involve X at all.
Usually this means that they have read something that tells them
about X in a different context, and they think that is an
essential part of solving their Y problem.

If we're lucky, they tell us what Y is. Frequently, XY questions
just show up as "How do I do X?" without context.

It happens a lot on this mailing list.

Or, maybe your expression of disbelief was about regular
expressions? A regular expression (regexp) is a specific kind of
formal language for specifying a pattern of tokens -- what we
often call a "string". If the regexp describes a candidate
string, we call that a "match". A common editing task is to find
all the matches for a regexp and replace them with some other
string.

The program "grep" takes its name from a sequence of editor
commands: global regular expression print.

Michael says that regexps aren't great at this particular task
because there's a variable component in the pattern which is
hard to describe. He comes up with a clever solution based on
the fact that the variable component is going to be an integer
sequence.

Post by Richard Owlett

Post by Michael KjÃ¶rling
However, since
it's presumably a once-only operation, I assume that you can live with
it being done in a suboptimal way in terms of performance.
In that case, assuming for simplicity that all the files are in a
$ for v in $(seq 1 119); do sed -i 's,<span class="verse" id="V'$v'">,<sup>,g' ./*.html; done

This sets up a loop which will execute 119 times, incrementing
the variable $v from 1 to 119. Inside the loop, it calls `sed`
to execute inplace (-i) which means it will change the files it
encounters rather than spitting out new files on standard out.

The command passed to sed is

s,<span class="verse" id="V'$v'">,<sup>,g

s means string substitution. It takes a pattern, a replacement,
and options, separated by the next character after the s, which
in this case is a comma.

<span class="verse" id="V$v">

is the pattern. Because of the loop, the value $v is going to be
replaced by the shell before sed sees this, so on various runs
through the loop sed will see:

<span class="verse" id="V1">
<span class="verse" id="V2">
...
<span class="verse" id="V118">
<span class="verse" id="V119">

You'll probably need to adjust this for other books.

Anyway, whenever sed sees the pattern above, it will replace it
with:

<sup>

which is what you said you wanted.

The option "g" means that said should do this multiple times if
it occurs in the same file (globally, like grep) instead of the
default behavior which is to find the first match and just
change that.

./*.html

tells sed to operate on all the files in the current directory
ending in .html -- yes, shells implement a version of regexp for
file pattern matching. And that's the end of the loop.

Post by Richard Owlett
I'll have to investigate sed further.
My project is not yet to the point of automatically editing ALL chapters. I
need to first establish how to edit all VERSES of an individual chapter.

The solution Michael presented can be run on just one file
instead of all the .html files in the current directory.

Post by Richard Owlett
ROFL ;} No one would define me as a "programmer". I took an introduction to
computers course as a E.E. student in the 60's. Most of my jobs required
background in component level analog electronics. Got one assignment because
I was not "afraid" of 8080 ;}

The true UNIX philosophy is that at any moment, any user can
stop being "just a user" and use the tools present to do some
programming to solve their problems.

-dsr-

Greg Wooledge

2024-06-29 13:00:01 UTC

Post by Dan Ritter
The option "g" means that said should do this multiple times if
it occurs in the same file (globally, like grep) instead of the
default behavior which is to find the first match and just
change that.

The g option in sed's s command means it will apply the substitution
multiple times per *line*. Not per file. It always applies multiple
times per file, unless you restrict the line range with a prefix.

hobbit:~$ printf 'foo foo\nfoo foo\n' | sed s/foo/bar/
bar foo
bar foo
hobbit:~$ printf 'foo foo\nfoo foo\n' | sed s/foo/bar/g
bar bar
bar bar

Michael Kjörling

2024-06-29 13:50:02 UTC

Post by Richard Owlett

Post by Michael KjÃ¶rling
Ignoring the question about Emacs

Emacs *CAN NOT* be ignored.

I did not say to ignore _Emacs_. I said that I was ignoring the
_question_ about Emacs, to instead...

Post by Richard Owlett

Post by Michael KjÃ¶rling
and focusing on the goal (your

^^^^^^^^^^^^^^^^^^^^^^^^

Post by Richard Owlett

Post by Michael KjÃ¶rling
question otherwise is an excellent example of a XY question), this is
not something regular expressions are very good at.

HUH ??????????

..._focus on the goal_.

(It is usually a good idea to read at least a whole sentence before
responding to it.)

The _goal_ in this case being your stated specific series of string
replacements.

If you want to use Emacs to do that, no one is stopping you from doing
so. You can directly adapt what I suggested to an Emacs workflow. But
just because a nailgun can be used to hang a painting doesn't mean
that a nailgun is the _appropriate_ tool for that particular job;
without detracting from its usability in _other_ applications.

Sometimes really all you are looking for is a small hammer.

--
Michael Kjörling 🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”

Curt

2024-06-29 16:10:02 UTC

Post by Michael KjÃ¶rling

Post by Richard Owlett
HUH ??????????

..._focus on the goal_.

Owlett is a notorious troll who never listens to reason.

But you people adore this kind of troll, inexplicably, perhaps because
he allows you to expand endlessly on your reams of essentially useless
knowledge.

t***@tuxteam.de

2024-06-29 17:00:01 UTC

Post by Michael KjÃ¶rling

Post by Richard Owlett
HUH ??????????

..._focus on the goal_.

Owlett is a notorious troll who never listens to reason.

This is wrong, borderline defamatory. Richard Owlett is not a
troll [1]. He may be uncommon in the way he approaches things,
and I do understand his ways may annoy some people.

If they annoy you, you always may choose to not respond. Others
will chime in. Much more polite and much more effective for the
whole mailing list.

Lobbing insults at people doesn't help anyone.

Cheers

[1] by the very definition of "troll", who isn't interested in the topic
itself, but just in eliciting a response.

--
t

Curt

2024-06-29 17:50:01 UTC

Post by t***@tuxteam.de

Post by Curt
Owlett is a notorious troll who never listens to reason.

This is wrong, borderline defamatory. Richard Owlett is not a

Andy Smith:

It's not an authentic Owlett thread unless it contains an enormous
XY problem, a monomaniacal obsession with a solution already
part-dreamed up by the OP, several factual errors, and a constant
trickle of confounding small details that were never provided up
front, now delivered with glee.

IOW, a troll. So go fuck yourself, as you should have done years ago.

Defamatory. What are you, a fucking lawyer? Sue me then, you little snit.

t***@tuxteam.de

2024-06-29 18:00:01 UTC

On Sat, Jun 29, 2024 at 05:43:15PM -0000, Curt wrote:

[...]

Post by Curt
Defamatory. What are you, a fucking lawyer? Sue me then, you little snit.

Bad day today?

I can't help you. I'm out of this thread.

--
t

Curt

2024-06-29 18:50:01 UTC

Post by t***@tuxteam.de

Post by Curt
Defamatory. What are you, a fucking lawyer? Sue me then, you little snit.

Bad day today?

As usual, you cut all that was pertinent to your meretricious commentary
and left only what suited your brain-damaged hypocrisy.

BTW, eliding a succinct paragraph to leave only a misleading sentence is
just the kind of inept lack of honesty that is your pathetic trademark. As if
you knew how to post, which you manifestly do not, because you cut the gist
of my remark for dishonest reasons.

Richard Owlett is a troll from way, way back. Take it on board or go fuck
yourself.

Richard

2024-06-29 23:00:01 UTC

That's how you warrant your ban, idiot.

Post by t***@tuxteam.de

Post by Curt
Defamatory. What are you, a fucking lawyer? Sue me then, you little snit.

Bad day today?

As usual, you cut all that was pertinent to your meretricious commentary
and left only what suited your brain-damaged hypocrisy.
BTW, eliding a succinct paragraph to leave only a misleading sentence is
just the kind of inept lack of honesty that is your pathetic trademark. As if
you knew how to post, which you manifestly do not, because you cut the gist
of my remark for dishonest reasons.
Richard Owlett is a troll from way, way back. Take it on board or go fuck
yourself.

Greg Wooledge

2024-06-29 23:10:01 UTC

Post by Richard
That's how you warrant your ban, idiot.

Let it go. Don't keep pouring more fuel on the fire.

Add Curt to your killfile (or whatever your MUA calls your ban list).
He's already been banned by the list admins anyway, so your local ban
is just for when the global ban is lifted.

Will Mengarini

2024-06-29 23:40:01 UTC

Post by Richard
That's how you warrant your ban, idiot.

Don't get yourself banned, Richard.

Anybody else remember Erik Naggum?

Geert Stappers

2024-06-30 09:50:01 UTC

Post by Will Mengarini

Post by Richard
That's how you warrant your ban, idiot.

Don't get yourself banned, Richard.
Anybody else remember .... ......?

Assume the person moved on, became a better human.

So no reason to dig up old harm.

Groeten
Geert Stappers
DD

--
Silence is hard to parse

t***@tuxteam.de

2024-06-30 12:00:01 UTC

Post by Geert Stappers

Post by Will Mengarini

Post by Richard
That's how you warrant your ban, idiot.

Don't get yourself banned, Richard.
Anybody else remember .... ......?

Assume the person moved on, became a better human.

I do remember that person. And, while he was extremely
opinionated, to the point of being grating, he also was
very smart and did contribute a lot to the Lisp discussion
and to other diverse fields. Much more than most of us
around here.

In retrospect, I think he was treated unfairly.

And he died far too young.

Cheers

--
t

Vincent Lefevre

2024-06-29 15:10:01 UTC

Post by Michael KjÃ¶rling
Yes, it almost certainly can be done with a single sed (or other
similar tool) invocation where the regular expression matches
precisely what you want it to match. But unless this is something you
will do very often, I tend to prefer readability over being clever,
even if the readable version is somewhat less performant.

To match a range inside a regexp, $(rgxg range 1 119) is readable. :)

rgxg is provided by the package of the same name.

--
Vincent Lefèvre <***@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

David Wright

2024-06-29 19:00:01 UTC

Post by Vincent Lefevre

Post by Michael KjÃ¶rling
Yes, it almost certainly can be done with a single sed (or other
similar tool) invocation where the regular expression matches
precisely what you want it to match. But unless this is something you
will do very often, I tend to prefer readability over being clever,
even if the readable version is somewhat less performant.

To match a range inside a regexp, $(rgxg range 1 119) is readable. :)
rgxg is provided by the package of the same name.

Perhaps best to ignore the narrow focus on 119 in the OP.
For bible verses per chapter, the largest number is 176.
(An accidental choice of 119 might be explained by that
psalm having the most verses. Only Psalms requires three
digits as it happens; I think the runner-up has only about
half that.)

It would be tedious and error-prone to have to specify the
maximum range for each chapter. Different versions of the
bible don't even agree with each other on numbers of verses.

Cheers,
David.

t***@tuxteam.de

2024-06-29 04:50:01 UTC

Post by Richard Owlett
Pluma is my editor of choice.
*BUT* it can NOT handle Search and Replace operations involving regular
expressions.

I would be *very* surprised if an editor, these days and age
can't do regular expressions. Really.

Post by Richard Owlett
Emacs can. It has much verbose documentation.
But examples seem rather scarce.

Of course, Emacs is the best editor out there, by a long shot.
But learning it is a long and panoramic road. You should at
least have a rough idea that you want to take it.

Post by Richard Owlett
I need to replace ANY occurrence of
<span class="verse" id="V1">
thru [at most]
<span class="verse" id="V119">
by
<sup>
I'm reformatting a Bible stored in HTML format for a particular set of
vision impaired seniors (myself included). Each chapter is in its own file.
How do I open a file.

Two ways of skinning that cat:

- in a terminal, type "emacs <yourfilename>"
- in an open Emacs instance (be it terminal or GUI, your
choice), type C-x C-f (hold CTRL, then "x", while holding
CTRL then "f"). You get a prompt in the bottom line (the
so-called minibuffer), enter your file name there. You
get tab completions.

Then there are menus...

Post by Richard Owlett
Do the above replacement.

Go to the top of your buffer (this is what you would call
"your file": Emacs calls the things which hold your text
while you are on them "buffers").
Do M-x (hold Meta, most of the time your Alt key, then "x").
You get a command for a prompt. Enter "query-replace-regexp"
(you get tab completions, so "que" TAB "re" TAB should suffice,
roughly speaking). Enter the regular expression you're looking
for. Then ENTER, then your replacement.

Post by Richard Owlett
Save and close the file.

To save, C-x C-s. I don't quite know what you mean by
"close".

To quit Emacs, C-x C-c.

Now I don't quite understand what you mean above with your
example, and whether it can be expressed by a regular expression
at all, but that is for a second go.

First, find out whether your beloved Pluma can deliver. I'm
sure it can. Unless you want to embark in the Emacs adventure
(very much recommended, mind you, but not the most efficient
path to your problem at hand).

Cheers

--
t

Richard Owlett

2024-06-29 11:40:02 UTC

Post by t***@tuxteam.de

Post by Richard Owlett
Pluma is my editor of choice.
*BUT* it can NOT handle Search and Replace operations involving regular
expressions.

I would be *very* surprised if an editor, these days and age
can't do regular expressions. Really.

Post by Richard Owlett
Emacs can. It has much verbose documentation.
But examples seem rather scarce.

Of course, Emacs is the best editor out there, by a long shot.
But learning it is a long and panoramic road. You should at
least have a rough idea that you want to take it.

Definitely interested
I worked for DEC in the 70's. Though an tech in Power Supply
Engineering, I was exposed to TECO and have recently seen claims that
Emacs is TECO done right. I've been exposed to many editors since but
TECO is memorable.

Post by t***@tuxteam.de

Post by Richard Owlett
I need to replace ANY occurrence of
<span class="verse" id="V1">
thru [at most]
<span class="verse" id="V119">
by
<sup>
I'm reformatting a Bible stored in HTML format for a particular set of
vision impaired seniors (myself included). Each chapter is in its own file.
How do I open a file.

- in a terminal, type "emacs <yourfilename>"
- in an open Emacs instance (be it terminal or GUI, your
choice), type C-x C-f (hold CTRL, then "x", while holding
CTRL then "f"). You get a prompt in the bottom line (the
so-called minibuffer), enter your file name there. You
get tab completions.
Then there are menus...

Post by Richard Owlett
Do the above replacement.

Go to the top of your buffer (this is what you would call
"your file": Emacs calls the things which hold your text
while you are on them "buffers").
Do M-x (hold Meta, most of the time your Alt key, then "x").
You get a command for a prompt. Enter "query-replace-regexp"
(you get tab completions, so "que" TAB "re" TAB should suffice,
roughly speaking). Enter the regular expression you're looking
for. Then ENTER, then your replacement.

Post by Richard Owlett
Save and close the file.

To save, C-x C-s. I don't quite know what you mean by
"close".
To quit Emacs, C-x C-c.
Now I don't quite understand what you mean above with your
example, and whether it can be expressed by a regular expression
at all, but that is for a second go.

When searching for information on regular expressions I came across one
that did it by searching for
{"1 thru 9" OR "10 thru 99" OR "100 thru 999"} .
I lost the reference ;<

Post by t***@tuxteam.de
First, find out whether your beloved Pluma can deliver. I'm
sure it can. Unless you want to embark in the Emacs adventure
(very much recommended, mind you, but not the most efficient
path to your problem at hand).

I'm still essentially at the stage of flow-charting how I need to handle
individual chapters. As there ~1000 chapters, I'll want to use something
that can handle macros eventually.

Thank you.

Post by t***@tuxteam.de
Cheers

t***@tuxteam.de

2024-06-29 17:20:01 UTC

On Sat, Jun 29, 2024 at 06:37:23AM -0500, Richard Owlett wrote:

[...]

When searching for information on regular expressions I came across one that
did it by searching for
{"1 thru 9" OR "10 thru 99" OR "100 thru 999"} .
I lost the reference ;<

That would be something like ([0-9]|[1-9][0-9]|[1-9][0-9][0-9])
since [x-y] expresses a range of characters, the | does OR and
the () do grouping [1].

If you allow yourself to be a bit sloppy [2], and allow numbers
with leading zeros, many regexps flavors have the "limited count
operator" {min,max}, with which you might say [0-9]{1,3} (you
won't need the grouping here, since the repeat operator binds
strongly enough to not mess up the rest of your regexp.

CAVEAT IMPLEMENTOR: Depending on the flavor of your regexps, the
() and sometimes the | need a backslash in front to give them
their magic meaning. In Emacs they do, in Perl (and PCRE, which
is most probably the engine behind Pluma) they don't. In grep
(and sed) you can switch behavior with an option (-E was it,
IIRC).

Cheers

[1] This grouping is (again, depening on your regexp flavour)
a "capturing grouping", meaning that you can refer later
to what was matched by the sub-expression in the parens.
There are also (flavor blah blah) non-capturing groupings.

[2] You always are somewhat sloppy with regexps. Actually you
are being sloppy already, since every classical textbook
will tell you that they totally suck at understanding
"nested stuff", which HTML is, alas. But under the right
conditions they can butcher it alright :-)

--
tomÃ¡s

Richard Owlett

2024-07-05 12:10:02 UTC

Post by t***@tuxteam.de
[...]

When searching for information on regular expressions I came across one that
did it by searching for
{"1 thru 9" OR "10 thru 99" OR "100 thru 999"} .
I lost the reference ;<

That would be something like ([0-9]|[1-9][0-9]|[1-9][0-9][0-9])
since [x-y] expresses a range of characters, the | does OR and
the () do grouping [1].
If you allow yourself to be a bit sloppy [2], and allow numbers
with leading zeros, many regexps flavors have the "limited count
operator" {min,max}, with which you might say [0-9]{1,3} (you
won't need the grouping here, since the repeat operator binds
strongly enough to not mess up the rest of your regexp.
CAVEAT IMPLEMENTOR: Depending on the flavor of your regexps, the
() and sometimes the | need a backslash in front to give them
their magic meaning. In Emacs they do, in Perl (and PCRE, which
is most probably the engine behind Pluma) they don't. In grep
(and sed) you can switch behavior with an option (-E was it,
IIRC).
Cheers
[1] This grouping is (again, depening on your regexp flavour)
a "capturing grouping", meaning that you can refer later
to what was matched by the sub-expression in the parens.
There are also (flavor blah blah) non-capturing groupings.
[2] You always are somewhat sloppy with regexps. Actually you
are being sloppy already, since every classical textbook
will tell you that they totally suck at understanding
"nested stuff", which HTML is, alas. But under the right
conditions they can butcher it alright :-)

Looks like KDE's Kate is viable solution for editing the particular HTML
files of interest. It seems to be an appropriate mix of Pluma's ease of
use and Emacs' power. And for some reason I had already installed it.

Max Nikulin

2024-06-29 11:40:04 UTC

Post by t***@tuxteam.de
Do M-x (hold Meta, most of the time your Alt key, then "x").
You get a command for a prompt. Enter "query-replace-regexp"

And to get help for this function

C-h f query-replace-regexp RET

To open user manual switch to the help buffer and press "i".

A side note since an answer to the asked question has been posted.

To manipulate with HTML it is better to write a script in some
programming language, e.g. for python there are lxml etree and
BeautifulSoup packages. This way it is easier to maintain valid document
structure with paired opening and closing tags.

I have not tried Emacs lisp facilities for dealing with HTML.

Max Nikulin

2024-06-29 15:10:01 UTC

Post by Max Nikulin
To manipulate with HTML it is better to write a script in some
programming language, e.g. for python there are lxml etree and
BeautifulSoup packages. This way it is easier to maintain valid
document structure with paired opening and closing tags.
I have not tried Emacs lisp facilities for dealing with HTML.

open in Geany

[...]

click search select replace
copy paste selection into "search for"

By "Emacs *lisp* facilities for dealing with HTML" I mead something like
`libxml-parse-html-region'. Notice that I was suggesting against
search&replace.

Greg Wooledge

2024-06-29 19:30:02 UTC

Oh, I see what the question was.
There is "use regular expressions", "use multi line matching" in Geany
I'm not very good at regular expressions.
I'd probably do it 3 times
"search for" <span class="verse" id="V(...)">
"search for" <span class="verse" id="V(..)">
"search for" <span class="verse" id="V(.)">

There's more than one regular expression syntax, so the first step is
to figure out which *kind* of regular expression you're writing.

In a Basic Regular Expression (BRE), you can write "one to three
digits" as:

[[:digit:]]\{1,3\}

In an Extended Regular Expression (ERE), you'd remove the backslashes:

[[:digit:]]{1,3}

Some people would use [0-9] instead of [[:digit:]]. [0-9] should work
in any locale I'm aware of, but is theoretically less portable than
[[:digit:]]. If you're actually doing this by typing a regex into an
editor, then [0-9] might be preferred because it's easier to type. If
you're writing a program, you should probably go with [[:digit:]].

Greg Wooledge

2024-06-30 13:30:02 UTC

got it thanks.
<span class="verse" id="V[0-7]{1,2}">
<sup>
<sup>
<sup>
<span class="verse" id="V19">
<span class="verse" id="V129">
<span class="verse" id="V138">

I don't know what you're trying to do, but ERE [0-7]{1,2} matches one-
or two-digit *octal* numbers (e.g. 5, 07, 72, 77) but not numbers that
contains the digits 8 or 9.

Do you have a book whose verses are enumerated in octal?

Andy Smith

2024-06-30 14:00:01 UTC

Hello,

Post by Greg Wooledge
Do you have a book whose verses are enumerated in octal?

No one clarified that this was the *Christian* Bible. 😀

Thanks,
Andy

Richard Owlett

2024-06-29 10:10:01 UTC

Post by Richard Owlett
Pluma is my editor of choice.
*BUT* it can NOT handle Search and Replace operations involving regular
expressions.
Emacs can. It has much verbose documentation.
But examples seem rather scarce.

nedit can handle regular expressions in search and replace operations.
I find nedit easier to use than emacs.

I've see references to nedit before.
But circumstances require I use this system in its current configuration.

Thank you.

38 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

Richard Owlett 2024-06-28 19:10:02 UTC

didier gaumet 2024-06-28 19:20:01 UTC

t***@tuxteam.de 2024-06-29 05:00:01 UTC

Richard Owlett 2024-06-29 10:10:01 UTC

Michael Kjörling 2024-06-28 21:00:01 UTC

Charles Curley 2024-06-29 03:30:01 UTC

Richard Owlett 2024-06-29 11:20:14 UTC

Michael Kjörling 2024-06-29 13:50:02 UTC

Andy Smith 2024-06-29 14:20:01 UTC

Lee 2024-06-29 16:10:02 UTC

Greg Wooledge 2024-06-29 12:50:01 UTC

Richard Owlett 2024-06-29 11:00:05 UTC

d***@howorth.org.uk 2024-06-29 12:00:01 UTC

Richard Owlett 2024-06-29 13:00:01 UTC

Dan Ritter 2024-06-29 12:10:02 UTC

Greg Wooledge 2024-06-29 13:00:01 UTC

Michael Kjörling 2024-06-29 13:50:02 UTC

Curt 2024-06-29 16:10:02 UTC

t***@tuxteam.de 2024-06-29 17:00:01 UTC

Curt 2024-06-29 17:50:01 UTC

t***@tuxteam.de 2024-06-29 18:00:01 UTC

Curt 2024-06-29 18:50:01 UTC

Richard 2024-06-29 23:00:01 UTC

Greg Wooledge 2024-06-29 23:10:01 UTC

Will Mengarini 2024-06-29 23:40:01 UTC

Geert Stappers 2024-06-30 09:50:01 UTC

t***@tuxteam.de 2024-06-30 12:00:01 UTC

Vincent Lefevre 2024-06-29 15:10:01 UTC

David Wright 2024-06-29 19:00:01 UTC

t***@tuxteam.de 2024-06-29 04:50:01 UTC

Richard Owlett 2024-06-29 11:40:02 UTC

t***@tuxteam.de 2024-06-29 17:20:01 UTC

Richard Owlett 2024-07-05 12:10:02 UTC

Max Nikulin 2024-06-29 11:40:04 UTC

Max Nikulin 2024-06-29 15:10:01 UTC

Greg Wooledge 2024-06-29 19:30:02 UTC

Greg Wooledge 2024-06-30 13:30:02 UTC

Andy Smith 2024-06-30 14:00:01 UTC

Richard Owlett 2024-06-29 10:10:01 UTC

about - legalese

Loading...