Apologies for the long post, I was doing enough debugging to put in a pull request to fix this bug before hitting a snag and figuring I would post here instead to get your opinions on the best way to resolve this problem.
The bug in brief: when trying to filter in a DataGrid (for example in simple mode), users may accidentally push the backslash key when trying to hit enter (they are right next to each other on US standard qwerty keyboards at least). This is an accident, but it shouldn't error out the page like it currently does if this accident occurs.
To replicate this bug:
- Go to the LoadData example
- Make it use the older
.Where(args.Filter)in the LoadData's filtering case (will also need to put@using System.Linq.Dynamic.Coreat the top of the page) - Filter any string column to a string ending in a backslash,
test\for example. - Get the exception: "InvalidOperationException: Unexpected end of string literal at position ##"
The error comes from the ScanStringLiteral() function in ExpressionLexer.cs:
Going Line by Line:
- Current character is: backslash. It enters this code block because of this.
- It advances, making the current character the double quote character (that is supposed to end the literal if it worked correctly).
- It uses ScanEscapeSequence which in effect: appends the double quote to the ongoing literal builder, and advances the current character to now be the right parenthesis (past the literal and into the larger filter string).
- Breaks the switch and continues the loop, which continues scanning and appending
)and later characters (which are out of the literal trying to be scanned).
I also notice offhand that similar issues may be present in the ScanCharacterLiteral() function since it uses identical code.
The reason I didn't put in a pull request:
The functions are designed to be parsing out literally typed out escape characters (ex: user types out First line\nSecond line), serializing them like "...Contains(\"First line\\nSecond line\")..." and equaling "First line\u000aSecond line". I think that this feature is cool, though poses a few possible concerns in my mind.
This specific bug arises due to one of those concerns, Lexical Ambiguity.
The current ScanEscapeSequence() (shown above) is operating on code stored inside a string, like this filter string: "x => (((((x == null) ? null : x.GroupEmail) ?? \"\").ToLower() ?? \"\").Contains(\"what \"about\" this\")" which is generated when a user types what "about" this in a filter box. This errors, and rightly so, since it makes the lexer believe that about is an identifier. Raises some red flags about code execution vulnerabilities and is counterintuitive for a user but it seems acceptable. If you wanted to get this working as a user and filter to a string with quotes in it, it's actually possible! You could instead type what \"about\" this, and it works great! But here is where the ambiguity lies. The filter string would read .Contains(\"what \\\"about\\\" this\"), and the filter string for the user entry test\ (like in my bug) would read .Contains(\"test\\\").
In the current lexer, reading \\\" corresponds to a double quote character in the resulting string literal token. This means when a user's filter string ends in /, the double quote that is supposed to end the string literal gets treated as a character in the literal. Thereby over-scanning and pulling in the rest of the filter string until it reaches the end of the string (or perhaps even the start of another string literal) which causes an error.
I'm not sure the best way to go about fixing this. I would imagine there are plenty of ways that a user could type something into a filter box and get an error, different combinations of \, ", and ' in various places in the string seem most obvious. Though not having an easy way to test it, I also wonder how a user might enter a ' or \ character literal and have it be treated correctly by the lexer. I have ranked some solutions that stood out to me.
- Ideally the discussed ambiguity would be fixed and more robust tests would be added, which would resolve the problem entirely. This would involve changes to the serializer and lexer, but if possible this would be best.
- A good but less idyllic solution to my problem would be to prevent the user from getting an error out of nowhere (just because they messed up and hit the \ key at the end) by moving that error to some other case during the lexing. If there is ambiguity in the text and the lexer has to make an arbitrary decision that's fine by me, and when that results in errors in less common use cases that would be acceptable. Similar to current logic, they just shouldn't be using those special characters in their filter boxes I guess. This would look like changes to ScanEscapeSequence() that would assume that
\\\"is actually a backslash followed by the end of the string. You would lose the ability to put double quotes in the middle of your filter box input without erroring (same logic as backslash with invalid letter after it), but would no longer error when a user accidentally presses \ at the end of their input. - A band-aid solution would be fixing the exact exact problem I was having, namely that a backslash at the end of a filter box input causes an error. This would be something like an if block in ScanStringLiteral()'s
case '\\':that does some sketchy checks to see if there is a parenthesis or something like that is immediately following a\\\"and if that is so just append a backslash and end the literal. The problem with this would be the detection of what qualifies, and if the structure of the filter string ever has a whitespace or a letter following the end of the string literal it would be basically impossible to detect. This kind of program logic also isn't great to include in a lexer. - Switch to using the newer
.Where(grid.ColumnsCollection). Since there is no serializing and deserializing with this overload, there is no error that occurs here. I could of course do this, but it would suck for the functionality between the two ways of using .Where() to be so different when it doesn't need to be. This would also cause me to have to switch over all of the LoadData filtering code to use the new overload, and I'm not sure enough about what else makes the two different to do this without a bunch of extra testing. It's also important to note thatgrid.ColumnsCollection.ToFilterString()serializes the filter string identically, so any sort of logic that involves getting the filter string and then saving or modifying it for use later will continue to need the.Where(filterstring)overload and have this bug. I have a specific example of a time that I am doing this, but am leaving it out for simplicity. - Get rid of the escape character logic entirely. This would suck because it is a feature loss, so definitely not the best solution. However, I would imagine that users are not often using this feature in regards to filter boxes specifically much at all, and it would be a partial solution to the ambiguity. It is also a bad solution because the lexer as a whole should probably still be able to parse out these characters for non-filter box related lexing.
Thanks for your time in reading this post, let me know if you have any questions or clarifications about any of my explanations. Also, thanks in advance for helping me resolve this problem.
