Newbie RegExp Question

For general computer discussion & help, come here

Moderators: Bakhtosh, EvilHomer3k

Post Reply
User avatar
Gedd
Technical Admin
Posts: 2812
Joined: Wed Oct 13, 2004 12:00 am

Newbie RegExp Question

Post by Gedd »

First, this is all in ASP, but I think RegExp's are used in a lot of different languages, so I'm looking for whatever help I can get.

I've avoided RegExp for a long time, since it mostly looks like some sort of mysterious voodoo magic to me, but I'm finally taking the plunge and trying to get some stuff working with a couple RegExp routines because they just make the most sense.

I got my first process to work, and now I'm working on a second one. For some reason, the second one is returning results that don't make a whole lot of sense to me, so I thought I'd see if someone could explain it, and possibly also help me get it to do what I'm looking to do.

Here's my code:

Code: Select all

<%
	str = "delim item1 delim some text delim item2 delim some other text delim item3 delim yet more text delim"
	
	'setup regexp
	Set objRegExp = New RegExp
	objRegExp.Pattern = "delim.*delim"
	'objRegExp.Pattern = "(delim)(.*)(delim)"
	objRegExp.Global = True
	
	Set Matches = objRegExp.Execute(str)
	
	For Each Match in Matches
		Response.Write "<br><br>Text: "" &  Match.Value & "" found at position " & Match.FirstIndex & "<BR>"
	Next
%>
And what my result is:

Code: Select all

Text: "delim item1 delim some text delim item2 delim some other text delim item3 delim yet more text delim" found at position 0
I've tried both pattern statements in the code and get the same result.

What I expected, after realizing that I'd made an error in my code, was all the combinations of delim whatever delim to show up in the Matches collection, but I only get the one. Why is that?

Part 2 for bonus points:

As you might have guessed from the original string, I'm looking for a way to grab the text that corresponds to an item, and delims are as delimiters. So what I'd like to be able to do is look for something like:

delim item delim .* delim

So if I was looking for item2, I would set the pattern to:

delim item2 delim .* delim

And get it to return only:

delim item2 some other text delim

Which would allow me to parse out the the text by using a Mid method combined with some Len statements.

Any ideas or thoughts? I'm sure this must be something like RegExp101, but I've never taken that class. :)
User avatar
Gedd
Technical Admin
Posts: 2812
Joined: Wed Oct 13, 2004 12:00 am

Post by Gedd »

Being the impatient (and just a little lucky) guy that I am, I managed to stumble across the solution:

Code: Select all

<%
	str = "delim item1 delim some text delim item2 delim some other text delim item3 delim yet more text delim"
	
	'setup regexp
	Set objRegExp = New RegExp
	objRegExp.Pattern = "(?:delim\s)(item2)(?:\sdelim\s)(.*?)(?:\sdelim)"
	objRegExp.Global = True
	
	Set Matches = objRegExp.Execute(str)
	
	For Each Match in Matches
		nm = Match.Submatches(0)
		val = Match.Submatches(1)
		Response.Write "Match: " &  Match.Value & "<br>"
		Response.Write "Name: " &  nm & "<br>"
		Response.Write "Value: " &  val & "<br>"
		Response.Write "<br><br>"
	Next
%>
Value ends up just as I want it, "some other text".

Looks like the expression was just getting "greedy" as it's apparently called. Putting in the ? after the ".*" kept it from doing so. The really nice thing is that in the course of doing this, I found a way to also get the value I wanted without parsing the match string with a mid statement. Huzzah!
User avatar
ChrisGwinn
Posts: 10396
Joined: Wed Oct 13, 2004 7:23 pm
Location: Rake Trinket
Contact:

Post by ChrisGwinn »

The short answer is that delim(.*?)delim should do what you want. The ? tells the parser to repeat as few times as possible.

You don't need to use different patterns to find the different items, since you can iterate through the collection of matches.

Is your delimiter actually the text 'delim'? Generally these things are just a single character. Assuming that your delimiter is, I'd just use System.String.Split() anyway.
User avatar
ChrisGwinn
Posts: 10396
Joined: Wed Oct 13, 2004 7:23 pm
Location: Rake Trinket
Contact:

Post by ChrisGwinn »

Oops. Looked like you didn't need my reply afterall.
User avatar
Gedd
Technical Admin
Posts: 2812
Joined: Wed Oct 13, 2004 12:00 am

Post by Gedd »

Thanks for the answer Chris...
ChrisGwinn wrote:Is your delimiter actually the text 'delim'?
No, that was just my test case to make sure that there wasn't some special character issue going on. I'm actually using a two character delimiter, "©®". I know it's better accepted to use a single character, but I'm not sure what the text is that might be running through here yet, so I need to make it as unlikely as possible that the delimiter is actually part of the intended text. I've got the delimiter actually stored in a variable, so I can change it if necessary.
Assuming that your delimiter is, I'd just use System.String.Split() anyway.
I considered using Split to make an array out of the string, but it was my intention to load up the string with a bunch of name value pairs, where I'd be retrieving the value based on the name. In other environments I'd use an array that simply let me pull up the value by referencing the name (something like array[#name]), but I couldn't find anything similar in the ASP book that I'm looking through.
User avatar
Gedd
Technical Admin
Posts: 2812
Joined: Wed Oct 13, 2004 12:00 am

Post by Gedd »

Chris (or anyone else), any chance you can answer one more question for me?

I've got the following read in and stored as a variable:

Code: Select all

some stuff

<!--If errorMsg-->{errorMsg}<!--End If-->

<!--If errorMsg2-->
{errorMsg2}
<!--End If-->

some other stuff
Using "<!--If\s*(.*?)\s*-->(.*?)<!--End If-->" as the pattern, I get a match on the first (errorMsg), but no match on the second (errorMsg2). Is there something that defaults to "don't go to the next line" and if so, is there some way around it? I can go with the first situation if I have to, but it'd be better if I can put line breaks in as well.

[edit]Minor correction to the pattern being used.
User avatar
Gedd
Technical Admin
Posts: 2812
Joined: Wed Oct 13, 2004 12:00 am

Post by Gedd »

Never mind, got this one too. :)

Replacing "(.*?)" with "([\s\S]*?)" does the trick. Didn't realize "." was only for everything except "\n".
User avatar
ChrisGwinn
Posts: 10396
Joined: Wed Oct 13, 2004 7:23 pm
Location: Rake Trinket
Contact:

Post by ChrisGwinn »

.NET's equivalent to a Perl associative array is System.Collections.Hashtable. You can use any object you like as the key or the value.
User avatar
Gedd
Technical Admin
Posts: 2812
Joined: Wed Oct 13, 2004 12:00 am

Post by Gedd »

Thanks Chris. If I have some time, I'll go back and see if I can do some test scripts using a HashTable and see if I can get it working.
User avatar
ChrisGwinn
Posts: 10396
Joined: Wed Oct 13, 2004 7:23 pm
Location: Rake Trinket
Contact:

Post by ChrisGwinn »

Gedd wrote:Never mind, got this one too. :)

Replacing "(.*?)" with "([\s\S]*?)" does the trick. Didn't realize "." was only for everything except "\n".
That's the default behavior, but you can disable it with the s option. Just wrap your whole pattern in an option group, like (?s:(.*?))
User avatar
ChrisGwinn
Posts: 10396
Joined: Wed Oct 13, 2004 7:23 pm
Location: Rake Trinket
Contact:

Post by ChrisGwinn »

ChrisGwinn wrote:
Gedd wrote:Never mind, got this one too. :)

Replacing "(.*?)" with "([\s\S]*?)" does the trick. Didn't realize "." was only for everything except "\n".
That's the default behavior, but you can disable it with the s option. Just wrap your whole pattern in an option group, like (?s:(.*?))
Heh.
Post Reply