Fixing XPath injection
Data-driven ASP.NET Core web applications can use XML databases as a means to store information and records. These data types are in XML format, and one way of navigating through the nodes of XML is by XPath.
Developers can, by mistake, dynamically construct XPath queries with untrusted data. This neglect can result in an arbitrary query execution or the retrieval of sensitive data from the XML database.
In this recipe, we will fix the XPath injection vulnerability in code.
Getting ready
Using Visual Studio Code, open the sample Online Banking app folder at \Chapter02\xpath-injection\before\OnlineBankingApp\
.
This example uses the following XML data:
<?xml version="1.0" encoding="utf-8"?> <knowledgebase>     <knowledge>         <topic lang="en">Types of Transfers</topic>         <description lang="en">             Make transfers from checking and savings to:             Checking and savings             Make transfers from line of credit to:             Checking and savings         </description>         <tags>transfers, transferring funds</tags>         <sensitivity>Public</sensitivity>     </knowledge>     <knowledge>         <topic lang="en">Expedited Withdrawals</topic>         <description lang="en">         Expedited withdrawals are available to our         executive account holders.         You may reach out to Stanley Jobson at         [email protected]         </description>       <tags>withdrawals, expedited withdrawals</tags>       <sensitivity>Confidential</sensitivity>     </knowledge> </knowledgebase>
How to do it…
Let's take a look at the steps for this recipe:
- Launch Visual Studio Code and open the starting exercise folder by typing the following command:
code .
- Navigate to Terminal | New Terminal in the menu or simply press Ctrl + Shift + ' in Visual Studio Code.
- Type the following command in the terminal to build the sample app to confirm that there are no compilation errors:
dotnet build
- Open the
Services/KnowledgebaseService.cs
file and locate the vulnerable part of the code in theSearch
method:public List<Knowledge> Search(string input) {     List<Knowledge> searchResult = new         List<Knowledge>();     var webRoot = _env.WebRootPath;     var file = System.IO.Path.Combine(webRoot,        "Knowledgebase.xml");         XmlDocument XmlDoc = new XmlDocument();     XmlDoc.Load(file);         XPathNavigator nav = XmlDoc.CreateNavigator();     XPathExpression expr =        nav.Compile(@"//knowledge[tags[contains(text()            ,'" + input + "')] and sensitivity/text()                ='Public']");     var matchedNodes = nav.Select(expr); // code removed for brevity
An XPath expression is dynamically created by concatenating the user-controlled input. Without any validation or sanitization done on the
input
parameter, a malicious actor can manipulate the XPath query by injecting malicious string, changing the intent of the whole expression. - To fix this security bug, let's refactor the code and implement input sanitization based on the whitelisting technique. To start, add a reference to both the
System
andSystem.Linq
namespaces:using System; using System.Linq;
- Add a new method to the
KnowledgebaseService
class and name itSanitize
:private string Sanitize(string input) { Â Â Â Â if (string.IsNullOrEmpty(input)) { Â Â Â Â Â Â Â Â throw new ArgumentNullException("input", Â Â Â Â Â Â Â Â Â Â Â Â "input cannot be null"); Â Â Â Â } Â Â Â Â HashSet<char> whitelist = new HashSet<char>Â Â Â Â Â Â Â Â (@"1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ Â Â Â Â Â Â Â Â Â Â Â Â abcdefghijklmnopqrstuvwxyz "); Â Â Â Â return string.Concat(input.Where(i =>Â Â Â Â Â Â Â Â whitelist.Contains(i))); ; }
- Call the new
Sanitize
method, passing theinput
parameter to it as an argument. Assign the result to thesanitizedInput
variable:public List<Knowledge> Search(string input) {     string sanitizedInput = Sanitize(input);     List<Knowledge> searchResult = new         List<Knowledge>();     var webRoot = _env.WebRootPath;     var file = System.IO.Path.Combine(webRoot,        "Knowledgebase.xml");         XmlDocument XmlDoc = new XmlDocument();     XmlDoc.Load(file);         XPathNavigator nav = XmlDoc.CreateNavigator();     XPathExpression expr =         nav.Compile(@"//knowledge[tags[contains(text()            ,'" + sanitizedInput + "')] and                 sensitivity/text()='Public']"); // code removed for brevity
The custom Sanitize
method will now remove unnecessary and possibly dangerous characters in the input string. The output is now passed into a sanitizedInput
variable, making the XPath expression safe from exploitation.
How it works…
As we have learned in Chapter 1, Secure Coding Fundamentals, in the Input sanitization section, input sanitization is a defensive technique that can be practiced to remove suspicious characters in a user-supplied input. This approach will prevent the application from processing unwanted XPath injected into the query.
We have created the new Sanitize
method that will serve as our sanitizer. Inside this method is a whitelist of defined characters and a Lambda invoked to remove the characters rejected from userName
:
HashSet<char> whitelist = new HashSet<char>(@"1234567890ABCDEFGHI JKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "); return string.Concat(input.Where(i =>Â Â Â Â whitelist.Contains(i))); ;
Searching for a help article with an unacceptable character will not throw an exception, and our sample Online Banking web application will also not process the string:
There's more…
An alternative fix is to parameterize the XPath query. We will define a variable that will serve as a placeholder for an argument. This technique allows the data to be separated from code:
XmlDocument XmlDoc = new XmlDocument(); XmlDoc.Load(file); XPathNavigator nav = XmlDoc.CreateNavigator(); XPathExpression expr =    nav.Compile(@"//knowledge[tags[contains(text(),$input)]        and sensitivity/text()='Public']"); XsltArgumentList varList = new XsltArgumentList(); varList.AddParam("input", string.Empty, input); CustomContext context = new CustomContext(new NameTable(),    varList); expr.SetContext(context); var matchedNodes = nav.Select(expr); foreach (XPathNavigator node in matchedNodes) {     searchResult.Add(new Knowledge() {Topic =      node.SelectSingleNode(nav.Compile("topic"))        .Value,Description = node.SelectSingleNode           (nav.Compile("description")).Value}); }
In the preceding code, the XPath expression is modified, and the $input
variable is now a placeholder for the previously concatenated input
value. We also used the XsltArgumentList
object to create a list of arguments to include input before passing into the XpathExpression
expression's custom context. In this way, the XPath query is parameterized and protected from malicious injection upon execution.
Note
This mitigation requires the creation of a user-defined custom context class that derives from XsltContext
. There are other classes required to make this XPath parameterization possible. The class files are included in the sample solution, namely; Services\XPathExtensionFunctions.cs
, Services\XPathExtensionVariable.cs
, and Services\CustomContext.cs
. The whole guide and source for these classes are also available online at the .NET official documentation: https://docs.microsoft.com/en-us/dotnet/standard/data/xml/user-defined-functions-and-variables.