Regular Expressions in Notepad++ or Scite

I like Notepad++ and Scite (for Windows and Linux respectively) and use them as my editor of choice for many projects. They are both formed off of the Scintilla backend. Today, I was going through some HTML forms, you know the ones with a multitude of option elements for state, country, etc. Imagine you had to get all of the state names and country names out of those option selections. There are plenty of ways, but doing a regex find and replace in Notepad++ makes it pretty easy. (Granted you could do this in about any editor worth its salt, but here I would like to show you how to do it in Notepad plus plus)

Basic regex

Ok, so you have a list of html options and you want to get the values without the HTML:

<option value="AFGHANISTAN">AFGHANISTAN</option><option value="ALBANIA">ALBANIA</option><option value="ALGERIA">ALGERIA</option>

etc....

You could go lift a list off of someone else's website, or you could easily get the values with Notepad++. Just paste the HTML into Notepad++ then hit Ctrl+H or go to Search -> Replace.

  1. Check the option box in the lower left for Regular expression.
  2. Type this regex in the Find box: <option value="[0-9a-zA-Z_&-.  ]+">
  3. Make sure the Replace box is empty. (You want to replace with empty space)
  4. Select Replace All.
  5. Check the option box for 'Extended (\n, \r, \t, \0, \x...)
  6. Type </option> in the Find box. 
  7. Type \r\n in the Replace Box. 
  8. Select Replace All

Hooray, now you have a list of countries (or whatever you had options for) delimited by a new line for each one.

Regex with find and replace

Now to get a little more advanced, let's say say we want to capture the find value and place it in a capture group so we can use it on our replace query. Fortunately, this is quite easy to do. Let's take an example, say we have a list of e-mail address such as this:

email@drupal.org
email2@drupal.org
email3@drupal.org

and the list goes on. Now, let's say we want to surround each e-mail address with quotes and place a comma after them, basically putting them in a csv style format. Well, if you had hundreds of e-mails this would be a lot of typing, but with regex replace it is simple. Go to Find & Replace (Ctrl + H) and make sure the box for regular expressions is checked. For the find we you can use

\([a-zA-Z0-9@.]+\)

You should recognize the midle part in the brackets as being a list of characters to search for, as we did above. What's new here is we now have a capture group using parentheses. Now, in Scite you have to use backslashes to escape the parentheses as noted above, I'm not sure if they are needed in Notepad++, so if the above doesn't work try remove the \ characters. Now, in the replace box you will put:

'\1',

The \1 is the first capture (and our only capture in this example) from the find query. Our find query grabbed email addresses, so now each email address will be replaced with surrounding quotes and a comma, and our list now looks like:

'email@drupal.org',
'email2@drupal.org',
'email3@drupal.org',

You can do much more with regular expressions in Notepad++, and I'd suggest taking a look at this blog as well as the Scite RegEx manual and of course get this book from Amazon.

Comments

Regex Issues in NotePad++

The problem with Regex in Notepad++ is that it is riddled with bugs. For example, try searching for newlines using Regex. If you have any \n in your regex, it will not match anything. Also, * and + don't work with groups for some reasons. So for example, the regex (ab)+ will not be able to match anything.

What's worse is that regex bug reports for Notepad++ tend to get ignored (or hardly looked at), so waiting for the fixes are not feasible...

So all in all, Notepad++ is good for everything BUT regex... In fact, most free editors are poor at regex. The only exception seems to be Textpad, which I think is the exact opposite of Notepad++. It is good at regex but poor at everything else... lol...

But not free. :-) If it

But not free. :-) If it only got some solid updates I would go back to it in a heartbeat.

http://www.scintilla.org

http://www.scintilla.org/SciTERegEx.html
"Note that \r and \n are never matched because in Scintilla, regular expression searches are made line per line (stripped of end-of-line chars)."

How to crash the current

How to crash the current version of Notepad++
do a regular expression search for $, which in the documentation is the character used for "new line"

Hehe, well i have to say, it

Hehe, well i have to say, it was never difficult to crash Notepad++ =)

Oh god I just did this....

Oh god I just did this....

notepad++ regex bugs

Yep. Stuff doesn't work. Seems it would be easy to fix, given the existence of excellent regex libraries.

I use UltraEdit for my regex work. Not free, but I've got a lifetime free update deal from its early days, and the bugs have been wrung out of its regex.

Bugs fixed

Checked last version of notepad++ and searched for \r \n and $ with only RegEx-box ticked, and found "end of line" :-)

find and replace with regex using notepad++

replace with \1 use empty string instead of original string.

i have some strings which i need to update

app.getString("email2@drupal.org")
app.getString("email3@drupal.org")

to i.e.

encode(app.getString("email2@drupal.org"))

regex for find string:- app.getString\(\"[a-zA-Z0-9@.]+\"\)
regex for replace string :- encode(app.getString("\1"))

output:- encode(app.getString(""))

if any solution is there then please reply

Add new comment

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Comment using an existing account (Google, Twitter, etc.)