What-is-XSS-attack

XSS attack ( Cross Site Scripting ) is a code injection attack that allows an attacker to inject malicious code into a website. If the browser interprets this injected data as code, XSS attack will be a success. Note that XSS is a misnomer. The injection of any script is called XSS. XSS attacks always execute in the browser.

How does XSS work

Http get requests where query string is used to pass input parameters to webapi and http form post requests can be used for code injection.

querystring
```
<html> 
<a href="https://some-insecure-website.com/search-page?searchstring=<script>/*some script*/</script> ">  <img src="http://attacker-site/click_for_discount.jpg"> </a>
</html>
```
Clicking on the above link (the user will simply see an image) will lead to a http get request to search-page api with input parameter searchstring whose value is set to a script by the attacker. This could lead to malicious code injection into the appserver. Note that
- The user could click on this link on another malicious website.
- The user could have received this link via email.
- The user could have received this link via social media.
html form .
- Let us say a web application accepts input from a user and displays that input on a webpage. For instance, it could be something as basic as asking the user for their name and displaying a greeting with the name.
- Instead of entering name, a user could provide input like < script >alert(1)< / script >.
- If the user input is embeddded inside html context, then, after the browser receives the server response, the browser will execute the alert function, creating an alert box.

An attacker could use a script to fetch the cookies stored in the victim’s browser and send them to the attacker’s machine. The attacker could then use those cookies for session hijacking.

Even without the script tag, the attacker can inject html elements . Events like onclick, onmouseover related the html element could trigger malicious script , for example

<img src="http://attackersite.com/someimage.jpg" onmouseover=alert(document.cookie);>

Types of XSS

Reflected XSS (Non-permanent XSS). Malicious code is injected into the website and is temporary. It affects a user. Typically an application takes some input from an HTTP request(get or post) and embeds that input into the immediate response without sanitizing it. Note that the web-server does not save the response. For example
- A web application has a search page that receives the user-supplied search term as a query string. https://some-insecure-website.com/search-page?searchstring=some data to be searched . The search-page end point echoes the supplied search term in the response to this URL: for example the response may be: You have searched for : some data to be searched
- An attacker has the following html in his web app
```
<html> 
<a href="https://some-insecure-website.com/search-page?searchstring=<script>/*some script*/</script> ">  click here </a>
</html>
```
- The attacker has created a malicious link with script in the url. If any user clicks on this malicious link it will lead to webapi receiving the malicious script and then returning this script as a part of html response to the browser and the malicious script can execute on the browser.

Note that Reflected XSS is much more difficult to detect since these attacks target a user directly (not stored in db ). For example, an attacker can create a malicious url and send it to the user via email, web based advertisements.

Stored XSS (Permanent XSS). Malicious code is injected into the server (typically via http get/post) and is permanent. (Typically stored in the application's database) (An example would be user comments which are stored in the database. A malicious user may input script tags with javascript code which will be stored by the server and sent to every user who views the comments. For example, the user could type this.
```
this is a user comment, along with text script tag is sent which will interpreted as executable script rather than text
<script>
alert(1);
</script>
```
Note that one user has injected script via comments input and then input got stored in the server and then the html response received by other users will have the malicious script. One thing to note is that in this example, once another user visits the comments page, no input is required for the injected script to execute. Also while in this example, the script is harmless, (it would only lead to an alert) typically the script will execute behind the scenes and may be used for example for stealing session cookie.
- Blind cross site scripting is a variant of stored XSS. In this attacker's malicious script is saved by the webserver and executed in another part of the application or in another application. (unlike normal stored xss attack where attacker's script executes in the page from which the script was pushed to server(eg comments page). For instance, the script may be uploaded from feedback page and viewed in the admin module. Blind XSS for this reason is harder to detect compared to reflected and normal stored XSS attack.
DOM-based XSS . In DOM based XSS, there is no interaction with the server ie the malicious code is not injected in the server. You could say that DOM based XSS is client side XSS . For this reason, the attack payload is not present in the server response. This is in contrast to other XSS attacks (stored or reflected), wherein the attack payload is placed in the server response page (due to a server side flaw). Ie the injection does not happen in the server. The injection happens on browser dom. Note that the user input which is part of query string is accessible via location.search within the page scripts as the search property returns the querystring part of a URL, including the question mark (?). Also window.location.hash can be used to access the anchor part of a URL, including the hash sign (#). The page script has to ensure that malicious code is not injected via URL. DOM based XSS is very hard to deal with as it doesn’t require a server hence it is nearly impossible to detect with static analysis tools or any other type of popular scanner. Multiple types of browsers and browser versions further complicate matters. Read more https://owasp.org/www-community/attacks/DOM_Based_XSS
Mutation based XSS : Mutation based XSS functions by making use of filter-safe payloads that eventually mutate into unsafe payloads after they have passed filtration.

XSS risks

Losss of session cookies. Typically http only cookies are safe from XSS attacks.(document.cookie object cannot be used to read http only cookies) . How ever XST attacks may stil be possible. The attackers uses an XSS attack to injection script which makes victim's browser send an HTTP TRACE request to the destination web server, which will return a response to the victim's web browser that contains the original HTTP request in its body. Since the HTTP header of the original HTTP TRACE request has the victim's session cookie in it, therefore session cookie can now be read from HTTP TRACE response and sent to the attacker's malicious site. Note that modern browsers's do not allow trace requests from script.
Malicious script can take any action which the user can .(delete / post / update). Note that the script may be running on the browser of an authenticated user hence authentication cookies would be present on the browser. Hence even without stealing cookies, script can take malicious action like deleting user's posts/comments in a blogging website. Hence while http only cookies cannot be stolen via xss attack yet user actions can be performed from the browser genuine user is logged in.
Malicious script can view sensitive user info like passwords, and credit card information.
Monitoring user actions.
Modify dom / Rewriting the contents of page.
- this can have serious ramifications. for example, the displayed stock price of a stock could be modified.
Download of trojans / external content.
Redirecting users to a malicious website.

The code injected by XSS is executed as legitimate application code giving the attacker full control over the web application executing in the user's browser ie the entire application is compromised with any (read/write/monitoring) , (server side/client side) operation possible.

Attack vectors commonly utilized in XSS attacks

script : A script tag can be used to reference external JavaScript code
body : attributes of this tag can be used to refer a script
img : used by attackers to execute the JavaScript code
iframe : allows the attacker to embed another HTML page into the current page
input : directly used to execute script : This tag can contain a script, instead of the normal use of linking to external style sheets
div : background attribute of this tag can be used to refer a script
object : Scripts from an external site can be included using this tag.JavaScript events: Another popular XSS vector used by attackers, event attributes, can be applied in a variety of tags. Such attributes as “onerror” and “onload” are examples.

How to prevent XSS attacks

Validate/Sanitize all user input: Whenever accepting any data, ensure the format of the data is what you expect. Any user input that is used as part of HTML output introduces a risk of an XSS. Validate user input to catch malicious user input.
- Server-side sanitization: Web application firewalls are commonly used for mitigating xss and other injection attacks like SQL injection. Most cloud providers provide WAF, for instance, you can deploy AWS WAF on Amazon CloudFront as part of your CDN solution, the Application Load Balancer that fronts your web servers or origin servers running on EC2, Amazon API Gateway for your REST APIs, or AWS AppSync for your GraphQL APIs. Read more here. https://aws.amazon.com/waf/. Google provides cloud Armor which provides Pre-configured WAF rules (SQLi & XSS). Read more here https://cloud.google.com/blog/products/identity-security/understanding-google-cloud-armors-new-waf-capabilities. Note that to escape WAF, an attacker may try to encode string characters or encode script in base64 and place it in the meta tag. Read more at https://owasp.org/www-community/attacks/xss. Note that it is better to use a tried, tested, and regularly maintained library on the server side as hackers will manage to fool simple filters by using techniques like hex coding and Unicode character variations. https://owasp.org/www-project-java-encoder/ can be used by java developers. Note that WAF is not sufficient to protect against XSS as new WAF bypass techniques keep coming up. Additionally, WAFs cannot block DOM-Based XSS.
- Client-side sanitization: DOMPurify is a javascript-based lightweight and secure HTML sanitizer.
- Sanitization Point to remember with respect to JavaScript pseudo scheme. URLs that use javascript: as the scheme, instead of http: or https: pose XSS risk. When a browser sees such a URL in an HTML document, it typically sees it as JavaScript code that needs to be executed. If the URL is dynamically generated based on user input then there is a clear XSS risk. Developers should analyze application code for any dynamically generated URLs. These are typically found in href and src attributes of HTML elements. Whenever a URL is created with untrusted data, you need to make sure that the URL is safe to use. Frameworks like React, Vue, and angular try to detect/sanitize URLs. Angular approves only of known safe URLs as opposed to React and Vue which block unsafe URLs. In general, looking for bad values is a security antipattern as this would only block known bad values. The better approach is taken by angular where known safe values are allowed.
- In cases where a web application needs to render dynamic HTML code (based on user input), output encoding cannot be used, and then sanitizing user input is the only solution.
Use safe sinks: Untrusted content can be assigned to a safe sink as it will always be interpreted as text, not code.
- eg .textConent
- safe HTML attributes include align, alink, alt, bgcolor, border, cellpadding, cellspacing, class, color, cols, colspan, coords, dir, face, height, hspace, ismap, lang, marginheight, marginwidth, multiple, nohref, noresize, noshade, nowrap, ref, rel, rev, rows, rowspan, scrolling, shape, span, summary, tabindex, title, usemap, valign, value, vlink, vspace, width.
Set suitable response headers: For example, for JSON, verify that the Content-Type header is application/json and not text/html to prevent XSS.
Content security policy can be used to reduce the chances of an XSS attack. Read more Content-Security-Policy . CSP is used to control valid sources of executable scripts. for instance, inline script execution can be blocked via CSP.
Use escaping / encoding of output to ensure any user-supplied data is passed into the DOM only as strings.” The best defense against XSS is that the browser never interprets data as code. Use an appropriate output escaping/encoding technique depending on where user input is to be displayed. In other words, output encoding is context-sensitive. The contexts are
- HTML / rendering context (this context is associated with the parsing of HTML tags and their attributes)
  - HTML Attribute
  - URL
  - CSS
- Javascript/execution context (associated with the parsing and execution of script code)
  - HTML Subcontext (subcontext within execution context)
    - Attributes like innerHTML and functions like document.write which can be used to write HTML content with help of javascript constitute the HTML subcontext within the execution context.
  - HTML Attribute subcontext (subcontext within execution context)

Note that HTML, javascript, and CSS all need to be escaped differently as they are parsed differently by the browser. Note that You cannot simply escape everything or else your own scripts and HTML markup will not work. Escaping will ensure that "the user input will be interpreted as text and not code" In other words, you are telling the browser that the data being sent should be interpreted as text only and not in any other way. Hence even if code injection is successful, it will just be interpreted as text and not executable code. It is always better to use an escaping library (for example https://owasp.org/www-project-enterprise-security-api/).

- - Handling HTML Context
    - If the user input (untrusted data) is to be inserted into HTML context (between HTML opening and closing tags) then in order to do so safely, use HTML encoding. (so that HTML is considered text) for example
```

<div>'If this data is user provided/ untrusted, it must be HTML-escaped.'</div>

<b>'If this data is user provided/ untrusted, it must be HTML-escaped.'</b>

<p>'If this data is user provided/ untrusted, it must be HTML-escaped.'</p>

<span>'If this data is user provided/ untrusted, it must be HTML-escaped.'</span>
```
      - Here are examples of encoded values for specific characters.
        
        & &
        
        < <
        
        > >
        
        " "
        
        ' '
  - Handling HTML subcontext within execution context: attributes like innerHTML and functions like document.write which can be used to write HTML content with help of javascript constitute the HTML subcontext. It is recommended to HTML Escape and then JavaScript Escape Before inserting Untrusted Data into HTML Subcontext. Also if javascript is being used to write HTML then use the .textContent attribute as it is a safe sink (it will automatically encode) unlike innerHTML.
```
 
<script>
document.getElementById("mydiv").innerHTML = user input which is not encoded

 
document.getElementById("mydiv").innerHTML = "<%=ESAPI.encoder().encodeForJavascript(ESAPI.encoder().encodeForHTML(userInput))%>";

 
document.getElementById("mydiv").textContent=user input which is not encoded 
</script>
```
  - Handling HTML attribute context
    - If the user input is used as a value for an unsafe HTML attribute in the rendering context then HTML attribute encoding is required.
```
<div id="mydiv" unsafeattribute="<%=ESAPI.encoder().encodeForHTMLAttribute(userinput)%>">
</div>
```
      Html encoding and HTML attribute encoding have differences. Read more at https://stackoverflow.com/questions/13246540/html-and-attribute-encoding.
    - Note that attribute value should be surrounded by quotation marks. This makes making XSS attack harder.
    - When setting an HTML attribute that is safe, there is no risk of XSS. Eg of safe attributes is width, height, etc. These safe attributes will never consider values as code.
    - If the attribute is an event handler like onclick, multiple levels of encoding would be required. To safely embed user input inside HTML event handlers, you need to handle both the JavaScript context and the HTML context. So first Unicode-escape the input, and then HTML-encode it:
      -  <a href="#" onclick="x='this content needs two layers of escaping'">example</a>
      - Read More: https://portswigger.net/web-security/cross-site-scripting/preventing
  - Handling HTML attribute subcontext within the execution context
    - If javascript is being used to set the value of an attribute then .setAttribute is a safe sink only to the extent that it automatically HTML attribute encodes. Hence only encoding for javascript should be done, For example,
```
<script>

document.getElementById("mydiv").setAttribute("unsafeattribute", encodeForJavaScript(userInput)); 


document.getElementById("mydiv").setAttribute("unsafeattribute", encodeForJavaSCript(encodeForHTMLattribute(userInput))); 
</script>
```
    - If the attribute is href or another URL attribute then URL encoding and then javascript encoding is required. For example,
    - ```
    var x = document.createElement("a");
    x.setAttribute("href", '<%=ESAPI.encoder().encodeForJavascript(ESAPI.encoder().encodeForURL(userRelativePath))%>');
```
- note that .setAttribute is not safe if the attribute is an event like "onclick". Even if javascript encoding is done still setAttribute is not safe. (attribute value will be converted to javascript and evaluated) . More on this later in the article.
```
    
    document.getElementById("button").setAttribute("onclick", encodeForJavaScript(userInput)); 
```
- When setting an HTML attribute that is safe, there is no risk of XSS. Eg of safe attributes are width, height etc
- Read more: https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html#rule-2-javascript-escape-before-inserting-untrusted-data-into-html-attribute-subcontext-within-the-execution-context
- Handling javascript context:
  - If the user input is used inside the javascript context (between script opening and script closing tag), then it should be within quotes and javascript encoded (only encoding may not be sufficient). For example
```
  
  <script> 
  //js code
  var x = <%= Encode.forJavaScript(untrustedData) %>;
  //js code
  </script>
  
  
  <script> 
  //js code
  var x = "<%= Encode.forJavaScript(untrustedData) %>";
  //js code
  </script>
```
  - In functions that accept code as a string, javascript encoding is not sufficient. The whole strategy of ensuring that user-supplied data is passed into dom only as string does not work, as the string itself is converted to code hence a good rule of thumb for DOM APIs to be aware of is anything that converts text to DOM or text to script is a potential XSS attack vector, "hence the following functions should avoid using untrusted user input".
    - setAttribute() , For example look at below code
    - <img id="1" href="http://example.com/abc.jpg"> <script> let i1 = document.getElementById("1"); // the second argument to setAttribute is userinput/ untrusted data which is javascript encoded but this will still execute //as the attribute name (onclick ) is an event handler, so the second argument string will //be converted to javascript and executed i1.setAttribute("onclick", "\u0061\u006c\u0065\u0072\u0074\u0028\u0032\u0032\u0029"); </script>
      
      Note that in the case of rendering context, javascript + HTML encoding would have worked. In general, the rendering context is safer compared to the execution context.
       <div class="user input should be html attribute encoded as rendering context">
    - eval(). eval() is JavaScirpt's global function, which evaluates the specified string as JavaScript code and executes it. (ie it takes a string as input and tries to run it as javascript). eval should never have user-supplied data even after encoding.
    - setInterval,setTimeout, new Function also have the same issue as they take code as string type. user-supplied input must be avoided as encoding is not enough.
    - const parser = new DOMParser(); const html = parser.parseFromString('<script>alert("hi");</script>`);
      
      again text to code conversion is happening which is not safe. praseFromString should never have user-supplied data even after encoding.
    - Normally user input in functions that covert text to code should be avoided however under certain circumstances they can be safe. Read more: https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html#rule-3-be-careful-when-inserting-untrusted-data-into-the-event-handler-and-javascript-code-subcontexts-within-an-execution-context
    - check GUIDLINE #5 : https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html#guideline-5-avoid-the-numerous-methods-which-implicitly-eval-data-passed-to-it
- Handling CSS context -
  - Use CSS escaping when untrusted data is inserted inside inline CSS styles.
  - Many CSS styles can be used to smuggle a script into your page. For example, executing javascript from CSS context requires passing javascript to
    - Stylesheet expression(...) method that permits JavaScript syntax in some browsers, The javascript expression will be evaluated and its value will be used in CSS.
    - CSS url('javascript:...') method on properties that support it. For example { background-url : “javascript:alert(xss)”; } is unsafe context. Here ensure that data passed to url() method is URL encoded and then javascript encoded. Eg
    - document.body.style.backgroundImage = "url(<%=ESAPI.encoder().encodeForJavascript(ESAPI.encoder().encodeForURL(companyName))%>)";

- - When using javascript, style.property is a safe sink, eg
```

<script>
document.getElementById("myH1").style.color = untrusted data;
</script>
```
  - Note that user input can only be used for the value of a CSS property. Other CSS contexts are not safe. (ie are dangerous) eg
```

<style>User input directly in CSS is not safe </style>
```
- Handling URL context. If user input is being placed in a URL then URL encoding should be used.
- Html attribute encoding + URL encoding.
  - In the case of rendering context, if user input is used in the href of anchor tag then URL encoding followed by HTML attribute encoding should be used.
  - ```
  
  
  
  <a href="http://clarifyforme.com?param=attributeEncode(<user input>)">click</a >
```
- For execution context: URL Escape then JavaScript Escape Before Inserting Untrusted Data into URL Attribute Subcontext within the Execution Context.
  - check rule #5 as mentioned https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html.

How does React framework prevent XSS

React APIs autoencode: React framework abstracts away the details of the browser’s DOM and provides a higher-level API to render components. It handles all of the details of putting data into the DOM. Components use the React APIs or the JSX templating language to define what should be rendered. Under the hood, React instructs the browser to create proper elements and update the DOM. React APIs will automatically autoescape so that malicious code if any is just seen as text by the browser. Let's consider the react create element API.

React.createElement("h1", {}, 'This is example comment');

This API creates and returns a new React element of the given type.

The first argument is the type of element we're creating, in this case, a <h1> tag. This could also be another React component.
The second argument is an object containing properties ('props' in React terms) that get passed to the component. React will protect against invalid values of keys in the props object. Read more : https://reactjs.org/warnings/unknown-prop.html
The last argument is the children of that component. React will autoescape this argument.

Whether components are generated using react API or JSX code react applies auto-escaping. React also does URL sanitization. As already explained in the article, URLs that use javascript: as the scheme, instead of http: or https: pose XSS risk.

Rendering HTML in react: Real-world applications often run into requirements where they need to render dynamic HTML code. That HTML code typically originates from untrusted sources, such as user-provided data. For example, Rich text editors generate HTML as output. When we put the output data into DOM using React APIs, React will ensure the output is properly encoded. If the output gets encoded HTML will be shown as text whereas the requirement is that the output should be considered HTML code, not text.

For this use case, React exposes the innerHTML attribute through the dangerouslySetInnerHTML property. The only point is that name of the property will warn the user's that this property is not safe to use. By itself, the API does not provide any protection. HTML sanitization should be done to protect against XSS attack in this case as output encoding will encode everything and cannot be used. Sanitization will remove malicious code. DOMPurify is a lightweight Javascript-based HTML sanitizer that can be used for HTML sanitization.

Escape hatches in React: React offers a higher-level API to render components. Direct access to DOM is generally not required. However higher-level API is not always enough. In certain scenarios, developers need to have direct access to the native DOM elements. To support such use cases, React offers an escape hatch, which provides the application direct access to native DOM. In React, two concrete escape hatches give access to native DOM elements: findDOMNode and createRef. When using DOM directly , the auto escaping capability of React is not there, and therefore HTML Sanitization or escaping becomes the responsibility of the developer.