RadzenDataGrid Filtering Error when '\' is present at the end of user's filter string

Apologies for the long post, I was doing enough debugging to put in a pull request to fix this bug before hitting a snag and figuring I would post here instead to get your opinions on the best way to resolve this problem.


The bug in brief: when trying to filter in a DataGrid (for example in simple mode), users may accidentally push the backslash key when trying to hit enter (they are right next to each other on US standard qwerty keyboards at least). This is an accident, but it shouldn't error out the page like it currently does if this accident occurs.

To replicate this bug:

  1. Go to the LoadData example
  2. Make it use the older .Where(args.Filter) in the LoadData's filtering case (will also need to put @using System.Linq.Dynamic.Core at the top of the page)
  3. Filter any string column to a string ending in a backslash, test\ for example.
  4. Get the exception: "InvalidOperationException: Unexpected end of string literal at position ##"

The error comes from the ScanStringLiteral() function in ExpressionLexer.cs:

Going Line by Line:

  1. Current character is: backslash. It enters this code block because of this.
  2. It advances, making the current character the double quote character (that is supposed to end the literal if it worked correctly).
  3. It uses ScanEscapeSequence which in effect: appends the double quote to the ongoing literal builder, and advances the current character to now be the right parenthesis (past the literal and into the larger filter string).
  4. Breaks the switch and continues the loop, which continues scanning and appending ) and later characters (which are out of the literal trying to be scanned).

I also notice offhand that similar issues may be present in the ScanCharacterLiteral() function since it uses identical code.


The reason I didn't put in a pull request:
The functions are designed to be parsing out literally typed out escape characters (ex: user types out First line\nSecond line), serializing them like "...Contains(\"First line\\nSecond line\")..." and equaling "First line\u000aSecond line". I think that this feature is cool, though poses a few possible concerns in my mind.

This specific bug arises due to one of those concerns, Lexical Ambiguity.

The current ScanEscapeSequence() (shown above) is operating on code stored inside a string, like this filter string: "x => (((((x == null) ? null : x.GroupEmail) ?? \"\").ToLower() ?? \"\").Contains(\"what \"about\" this\")" which is generated when a user types what "about" this in a filter box. This errors, and rightly so, since it makes the lexer believe that about is an identifier. Raises some red flags about code execution vulnerabilities and is counterintuitive for a user but it seems acceptable. If you wanted to get this working as a user and filter to a string with quotes in it, it's actually possible! You could instead type what \"about\" this, and it works great! But here is where the ambiguity lies. The filter string would read .Contains(\"what \\\"about\\\" this\"), and the filter string for the user entry test\ (like in my bug) would read .Contains(\"test\\\").

In the current lexer, reading \\\" corresponds to a double quote character in the resulting string literal token. This means when a user's filter string ends in /, the double quote that is supposed to end the string literal gets treated as a character in the literal. Thereby over-scanning and pulling in the rest of the filter string until it reaches the end of the string (or perhaps even the start of another string literal) which causes an error.

I'm not sure the best way to go about fixing this. I would imagine there are plenty of ways that a user could type something into a filter box and get an error, different combinations of \, ", and ' in various places in the string seem most obvious. Though not having an easy way to test it, I also wonder how a user might enter a ' or \ character literal and have it be treated correctly by the lexer. I have ranked some solutions that stood out to me.

  • Ideally the discussed ambiguity would be fixed and more robust tests would be added, which would resolve the problem entirely. This would involve changes to the serializer and lexer, but if possible this would be best.
  • A good but less idyllic solution to my problem would be to prevent the user from getting an error out of nowhere (just because they messed up and hit the \ key at the end) by moving that error to some other case during the lexing. If there is ambiguity in the text and the lexer has to make an arbitrary decision that's fine by me, and when that results in errors in less common use cases that would be acceptable. Similar to current logic, they just shouldn't be using those special characters in their filter boxes I guess. This would look like changes to ScanEscapeSequence() that would assume that \\\" is actually a backslash followed by the end of the string. You would lose the ability to put double quotes in the middle of your filter box input without erroring (same logic as backslash with invalid letter after it), but would no longer error when a user accidentally presses \ at the end of their input.
  • A band-aid solution would be fixing the exact exact problem I was having, namely that a backslash at the end of a filter box input causes an error. This would be something like an if block in ScanStringLiteral()'s case '\\': that does some sketchy checks to see if there is a parenthesis or something like that is immediately following a \\\" and if that is so just append a backslash and end the literal. The problem with this would be the detection of what qualifies, and if the structure of the filter string ever has a whitespace or a letter following the end of the string literal it would be basically impossible to detect. This kind of program logic also isn't great to include in a lexer.
  • Switch to using the newer .Where(grid.ColumnsCollection). Since there is no serializing and deserializing with this overload, there is no error that occurs here. I could of course do this, but it would suck for the functionality between the two ways of using .Where() to be so different when it doesn't need to be. This would also cause me to have to switch over all of the LoadData filtering code to use the new overload, and I'm not sure enough about what else makes the two different to do this without a bunch of extra testing. It's also important to note that grid.ColumnsCollection.ToFilterString() serializes the filter string identically, so any sort of logic that involves getting the filter string and then saving or modifying it for use later will continue to need the .Where(filterstring) overload and have this bug. I have a specific example of a time that I am doing this, but am leaving it out for simplicity.
  • Get rid of the escape character logic entirely. This would suck because it is a feature loss, so definitely not the best solution. However, I would imagine that users are not often using this feature in regards to filter boxes specifically much at all, and it would be a partial solution to the ambiguity. It is also a bad solution because the lexer as a whole should probably still be able to parse out these characters for non-filter box related lexing.

Thanks for your time in reading this post, let me know if you have any questions or clarifications about any of my explanations. Also, thanks in advance for helping me resolve this problem.

A quick test that would pass when this issue is resolved:

expression = ExpressionParser.ParsePredicate<ItemWithGenericProperty<string>>("p => p.Value == \"test\\\"");

func = expression.Compile();

Assert.True(func(new ItemWithGenericProperty<string> { Value = @"test\" }));

The filter expression should be valid C# code which "test\" isn't. We shouldn't add support for such cases in the parser itself.

Can you reproduce it without this?

I'm not exactly sure what you are asking, since you quoted my original post but the "test\" string you commented about only occurs in my later comment with the test, but to clarify:

The bug I am talking about is caused when the user accidentally types in something like test\ into a DataGrid column's filter box, trying to filter that column to only rows containing the text "test" (such as a row with "testing build 5.7"). They didn't mean to put the backslash in there, but it is right above the enter key and easy to hit if you aren't careful or are typing fast on an unfamiliar keyboard. When this does happen, it shouldn't cause an error. There shouldn't be a keyboard button that, if pressed, ends their circuit from an exception. This is my concern, since it is easy for a user to do on accident.

The rest of my post was about why this bug was happening, and some possible ways to fix it.

As for any invalid c# code concerns, I don't see them. The flow is like this:

  1. User types in: test\
  2. Serializes as filter string: "x => (((((x == null) ? null : x.GroupEmail) ?? \"\").ToLower() ?? \"\").Contains(\"test\\\")"
  3. Causes exception during LoadData filtering on line containing .Where(args.Filter)

If you are talking about my comment with the test, it could be rephrased as follows to match with the existing test:

public void Should_ParseEscapeSequenceInStringLiteral()
{
    var expression = ExpressionParser.ParsePredicate<ItemWithGenericProperty<string>>(@"p => p.Value == ""\n""");

    var func = expression.Compile();

    Assert.True(func(new ItemWithGenericProperty<string> { Value = "\n" }));

    expression = ExpressionParser.ParsePredicate<ItemWithGenericProperty<string>>(@"p => p.Value == ""test\""");

    func = expression.Compile();

    Assert.True(func(new ItemWithGenericProperty<string> { Value = "test\\" }));
}

Please let me know if I didn't answer your question, or am not understanding something about your question or the serialization/deserialization involved.

I tested entering Test\ in our online demo and didn't see any exception. This is why I asked you why you need the Where(args.Filter).

The following isn't valid C# when you unquote it:

@"p => p.Value == ""test\"""

This won't compile:

 Expression<Func<ItemWithGenericProperty<string>, bool>> expression = x => x.Value == "test\";

It should either be:

x => Value == "test\\" or x => Value == @"test\";

In any case if you must use Where(args.Filter) you can escape the \ with \\ - Where(args.Filter.Replace(@"\", @"\\"). Users would rarely manage to enter a valid escape sequence anyway.

Oh I see. Yes the .Where(args.Filter) is needed because that is what causes the bugs, and they do not occur without it. The example on the demo uses the newer .Where(grid.ColumnsCollection) overload that does not have this same problem. As for why I want to use .Where(args.Filter), I have at least 2 pages that requires it's use and a bunch of others that I haven't migrated to the new overload. If the .Where(args.Filter) method should no longer be used, please let me know. If not, the functionality between the two .Where overloads shouldn't differ this majorly.

For my test, if you wanted to unquote it for some reason you would have to escape the backslash properly. It is already valid C# as far as I understand. It would be equivalent to x => Value == "test\\" as you said. Just a difference in how you type it as C# or as a string containing C# code. The filter string generator is producing this same valid code for the filter string, it is the lexing that is the problem. I have this test passing just fine with a band-aid fix for the lexer bug I used while debugging.


The solution you provided would works as a very specific workaround for sure, but in my original post I described how it is part of a larger issue with the lexer's escape parsing. Like you said, users rarely manage to enter a valid escape sequence. Yet, that same rare functionality causes an error if they accidentally enter an invalid one (which can be done often).

Your solution definitely maneuvers past the lexer bug, since it produces a filter string (as actual runnable c# for clarity) like: x => ((((x == null) ? null : x.LastName) ?? "").ToLower() ?? "").Contains("test\\"). Since the bugged lexer logic skips over 1 of the 2 backslashes completely, that would indeed filter to the user's typed text of test\ and more importantly will not error. However, it also has some problems that make it far from a comprehensive solution:

  • How would you handle it if a user enters something like John "Johnny" Johnson or test" as a filter. The double quote has a wider (involves ambiguous filter string generation) but similar problem as the backslash does, where its a key that the user's just aren't allowed to press without erroring their page. Plus, you can't simply replace out the quotes like the backslashes, since the filter string contains a bunch of double quotes.
  • The fix suggested would, as you mentioned, mean the loss of ability to use escape characters. This also prevents the existing workaround for the previous point that involves typing in John \"Johnny\" Johnson as a filter and having it work properly.

While it is a great band-aid solution, the actual two intertwined bugs still exist when using .Where(args.Filter):

  1. User entered double quotes are serialized into the filter string in a way that is ambiguous. This causes an error rather than some graceful handling.
  2. The lexer does not parse backslashes correctly when a user enters them, which results in an error rather than some graceful handling.

Basically, I would expect that when a user enters the following text into a filter box,
John "Johnny" Johnson\
They would get the following result string for sql:
John "Johnny" Johnson\
Exactly what they typed should be what they are filtering for.

Given your new insights and further thought, I thought of one additional solution that would be easier to implement and fairly clean. It would be like an interception as you are providing the user's entered text to the filter string generator. If you took the user's literally entered text, and placed a backslash in front of every entered double-quote/backslash, it would work properly without any changes to the lexer. It works in a very similar way to your workaround, where the extra backslash would get skipped by the lexer and the following character treated as a literal. Equivalent to using the workaround I discussed earlier to make it not error by entering John \"Johnny\" Johnson\\ and resulting in sql with John "Johnny" Johnson\. Though of course this quick and easy of a solution has to be done carefully, because it would disable the use of actual special characters like \n without better detection to exclude those. It would also start breaking again if double backslashes were put in, etc. so it wouldn't be a complete solution, but could be a good partial solution.

Anyone have any thoughts on the best way to resolve this? I've converted as many as I can to the new .Where() overload, but still have a few pages that require the FilterString version which has this issue.

To prevent the specific bug where ending the filter string with a backslash causes an over scan, the following code from the ExpressionLexer:

Could be replaced with something like the following code, which actually reads the next character instead of skipping it. This is the extremely specific cause of the bug I noticed that started this, the backslash causes the lexer to notice the improper escape sequence \" at the end of the string literal. Then the lexer skips over the double quotes entirely and starts scanning into the parenthesis and other parts of the filter string until erroring. This change would cause improper escape sequences to print both the backslash and the following character:

case '\'':
case '"':
    break;
case '\\':
    Advance(-2);
    ch = Peek();
    Advance(1);
    break;