A Python web framework that makes the most of the
Simplates are the main attraction.
According to Guido, Python “has excellent support for Unicode, and will keep getting better.” The same is true of ... oh no! Snowman is being attacked by Comet! But, look! Linear Buck, hero from beyond the Basic Multilingual Plane, is coming to his rescue! Hooray for Linear Buck!
In designing Aspen’s Unicode handling, the following priorities have been in view:
This document describes Aspen’s approach to Unicode security, and then describes Aspen’s algorithms for decoding Requests and encoding Responses, with reference to the de jure standards, de facto browser behavior, and advanced use cases.
The canonical reference for security issues related to Unicode is this Technical Report from the Unicode Consortium:
Most of the discussion revolves around spoofing websites by registering
visually confusing domain names such as
the second ‘a’ is actually from the
Cyrillic and not the Latin alphabet. That’s a problem for browser
vendors to solve, and for you to take advantage of, if you’re a Bad Guy
like Comet (just watch out for Linear Buck!).
What Snowman has to worry about are the “Non-Visual Security Issues.” The basic idea is that any algorithm that mutates character data is a chance for Comet to game that algorithm. If Comet can sneak in an extra path separator or remove a quotation mark, then she may be able to traverse Snowman’s filesystem or inject some extra SQL. What is Snowman to do?
After validating your inputs, make sure that you don’t transcode the data again before using it. Here’s a simple illustration:
open("/../etc/password").read()and returns the result to Comet.
Obviously this is a contrived example but it makes the point. TR36 mentions seven algorithms in Unicode and goes into the details of how to game them:
ⓔ for ⓔ
The good news is that Python handles almost all of these for us, and Aspen handles the rest. If Aspen is given an HTTP Request that doesn’t decode cleanly according to the algorithm below, then it returns a 400 Bad Request.
Here are the parts of the Request with notes on how Aspen decodes them:
request line method subset of ASCII, per spec uri path subset of ASCII, per spec (but WSGI servers do things) querystring subset of ASCII, per spec (but IE sends raw UTF-8) version headers ??? body ???
If a browser or other program sends anything else to Aspen, it’ll get 400 Bad Request.
The Aspen Response object takes body as a bytestring or iterable of bytestrings. If you set response.charset in a template resource then that will be added to Content-Type if your mimetype is of major type 'text'. There is no default charset for static resources, which means HTTP-conformant clients will try ISO-8859-1, but most will probably try to guess based on how the bytes smell.Home Virtual Paths