We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.
Finally a great parsing example! I see you haven't done much blogging in the last 8 years, but would it be OK to pack up this code an publish it on GitHub (with all the techniques I can muster on giving you credit?).
It seems from some previous comments that you explicitly intend for this code to be free and shareable, but just thought I'd ask in any case.
Edit: I went wild and put it up on GitHub. Shout out here or in the issue tracker if you want the repo handed over to you. Also, would be really cool if you OK-ed a license of some kind, ref Issue #1 :-)
Yes, you're welcome to, Carl-Erik Kopseng! (and thanks for asking/reminding me)
I've just put a license (MIT) on the blog footer too.
@RichB: Thanks Rich - I don't currently publish a licence under which snippets and work on this blog can be re-used. But have had the question come up previously --> http://www.singular.co.nz/b...
I've intended it to be freely available, customisable and distributable.
I'll choose a license shortly, however, please feel free to use it as you will until I've specified a license.
HTH
I'd love to use this. What's your license on it?
Also, .Net 4.5 has an API for this, which at the time of writing is a Work-In-Progress in Mono:
https://github.com/mono/mon...
@Jacob: Hmm, certainly something to think about regarding precedence. The RFC doesn't give a specific example - I've popped the question over to stack overflow (http://stackoverflow.com/qu... to see what others think.
Indeed both are acceptable, but is the really still a precedence?
Great article. Like you I'm also a purist with this type of thing. Even though a simpler approach would work *most* of the time, I'm the type of guy who wants it work *every* time.
In the end I opted to roll my own code rather that use yours because I felt that using regular expressions would make it simpler. The regular expression below matches a single acceptable encoding and optional quality value. Use it with the Matches method of a Regex (hopefully it won't get mangled in the comments).
@"(?<type>[^\s;,]+)(?:\s*;\s*q\s*=\s*(?<quality>\d?\.\d{1,3}|\d))?\s*(?:,|$)"
Also, after reading the HTTP spec a dozen times, I'm not sure I agree with you on point 2 above. There's nothing I have found that would imply that the order the codings are listed in has any bearing on their "priority". An accept encoding of "deflate,gzip" would be the equivalent of "gzip;q=1,deflate;q=1". The browser is prioritize both equally, and therefore, either response is valid.
I'll be using it for Accept-Charset for which it looks just fine! So thanks for posting it.
@Piers: Thanks for the comment. You are correct in a sense. I remember looking at the RFC at the time of writing the post, but didn't believe it applied to the Accept-Encoding header as there wasn't additional information about it - no examples are given under the "Accept-Encoding" section, only for "Accept", which is for media type rather than encoding.
The 'level' token seems to be classified as an "accept-extension", and while they give examples, they don't explain what they are used for. I could only find a reference (on an Apache resource) that it is "used to give the version of text/html media types".
It doesn't seem to be legal syntax to send anything beyond the simple syntax for the Accept-Encoding header - at least as far as I could interpret. So it would seem that the code above is still sound as it is for the purposes of the post's topic of compression.
However, I've suggested that it could be used for the other "Accept" headers and on hindsight (and a little more investigation as I didn't look deeply at the other header specs at the time) is not the case. It can only be applied to the "Accept-Encoding" header.
Thanks again, I appreciate being kept on my toes :)
I like this and will probbaly use it in my RESTful MVC example. I do have one issue with the article though... I don't think your code will deal with the Accept header as you suggest. The accept header requires a more complex comparison taking into account q value but also how specific the "Name" element is. Also the Accept header allows each media to have more than one parameter (i.e. there could be more than just q=0.8). For example, RFC 2616 gives the example:
Accept: text/*;q=0.3, text/html;q=0.7, text/html;level=1,
text/html;level=2;q=0.4, */*;q=0.5
But for dealing with the simpler headers, this looks spot on!
BTW There is an example SDK from Microsoft on Codeplex that contains code for parsing the Accept headers into a sorted list.
You might have a bigger problem here. Don't you actually mean Transfer-Encoding? Is compressing, as you're using it here, a content encoding? Go read the RFC before responding.
Thanks a bunch for this Dave. I was using a modified version of B. Lowery's encoding and quality detection code - but this is much cleaner. Awesome. Have just updated my photo gallery app at 58bits.com (with credits and a link to this post in the code).
Hi Paul, sorry for the late reply.
It sounds to me like 'something' along the way is modifying the header - a proxy being a likely candidate - it may be your ISP as well.
I would think this happens often, but since it never has a real effect, like you pointed out, is a non-issue, and we (developers) tend to notice less.
In regards to the VARY response header, it's best (and common in code in the wild) to specify the header, especially if you're trying to achieve compression. It tells downstream proxies and caches under which circumstances it should cache or not.
The response header "Vary: Accept-Encoding" helps when, for instance a gzip file is cached by a proxy, and a new client requests the same file, but can't handle gzip. It (the proxy) should then serve a more appropriate file based on the new request header "Accept-Encoding".
Although, whether any of the intermediate proxies follow all the rules and guidelines laid out in a standard is open for debate :)
As a side note, I've wasted quite a few hours trying to get proxies to obey headers sent from a server with varying degree's of success. One such case recently was trying to get the proxy to close the connection so a 'simple' Response.Redirect from HTTP to HTTPS would work. I never quite got this one fixed and had to jump through a few hoops to get the very simple redirect to work. I think a lot comes down to configuration of the proxy in question - and whether the proxy owner has the time or inclination to help.
Hope that helps :)
I have a "qwirk" at the moment, I have a browser and localhost that sends a "gzip,deflate" value. however when I connect to a server on the net, it only supplies "identity,deflate"
I have only ever seen 2 values supplied by a browser (or one value), however, would you guess (as I do) that the proxy of "this" network is modifying ACCEPT-ENCODING, essentially saying it will not accept "GZIP" (the proxy is SQUID fyi)
Although technically the delivery is a none issue (it can and does revert to deflate, or identity), I wonder if this senario with the proxy is actually a common thing or not.
What are your thoughts..
Paul
PS there are some "wierd" descussions regarding this (I just read some squid-dev list posts going back to 1998, mentioning a VARY response header from the atrget server???
@James L:
That's a fair statement and you are correct in a sense. Although, you would _know_ "If it can read it" or not by your correct interrogation of the headers - basically anything that isn't explicitly denied you could send when accept "*" is specified.
Then, if you're doing things right, the rest becomes pure preference and you are free to send it how you want from what's left over.
To hell with what the client prefers. If it can read it, I'll send it how I bloody well want. Good day sir.
@Miron wrote "Theoretically you are right, but
can you give an example of any browser that will use the request "Accept-Encoding" with the value
"gzip;q=0,deflate" or "*" ?"
Actually "*" is used more than you think, it is used by most of the AJAX requests, as well as intermediate proxy servers like those used to serve Facebook applications.
Thanks for this post I will definitely be using some form of this in my future projects.
@Damien:
If a new encoding 'comes out' you'd have to revisit your code in any case - you'd have to add the compression algorithm itself, then the case for setting up the response filter - so I wouldn't see that as a problem per se.
Although, I do see that the wild card scenario, where it falls back to "gzip", should really loop over the preferred options to find the first accepted encoding, as opposed to simply seeing if "gzip" is not explicitly denied.
PS: In the common code out there, if that 'new encoding' was far fetched and inappropriately named "gzip2008" (or perhaps even "compressingzip", i.e. "Compressing Zip"), everyone would be served content with gzip encoding, because the "gzip" string is still found.
@DavidP:
That's a good point, but not all applications or developments have access to controlling this feature in IIS or by an external device, especially in a shared hosting environment. I imagine it will get easier to achieve with a configuration option in IIS7 so we won't have to touch IIS (in an administrative sense) to get the compression benefits.
But it's not only about compression, the post is really highlighting the point that there is more to interrogating http headers than is commonly practiced.
The problem with this code is that if a new encoding type comes out that browsers specify in preference over gzip this code will not even consider falling back to gzip.
Really that case over .preferred should be a foreach loop over the order of priority and break out as soon as it finds something it can deal with.
[)amien
This is off-topic by why are we doing HTTP compression at the application level? Wouldn't this be better accomplished using IIS or an external device?
@Miron:
Yes, the semicolon is to separate the quality, and comma for the actual types.
I've updated the post. Good spotting :)
Thanks for your reply.
I do think this check must not be ignore.
Another thing, is "gzip;deflate" valid? isn't the separator char must be a comma ',' ?
@Miron:
I wouldn't think it common place for this to be sent. And quite possibly because it may happen so infrequently, we wouldn't notice a miss in this area. And possibly, in the scenario of simply giving preference to the encoding types, the user agent would be able to handle the served content in any case, which is no biggie.
As to an example of a modern browser with out-of-the-box settings, I wouldn't think there are any.
Proxies altering headers of those browsers I'm not so sure about, or non-browser user-agents perhaps used to check/scrape sites with very specific requirements. What if a network admin, reading the HTTP specification, wanted to deny all gzip content (with gzip;q=0) because of some issues with another component working in the network, she'd still be served gzip content. Okay, its not likely, but what if there are? It's all hypothetical really.
This leads me back to the main point - most implementations ignore this preference; should they be?
PS: In my last comment to Simone, I looked into whether the accept languages were handled with this in mind but I can't find evidence that they are. Does this mean its entirely unimportant? The OutputCache module certainly thinks it is, being the only one that parses the weight of the preference.
@Simone:
I didn't check for this earlier, but having a look via reflector I found a couple of instances where headers are parsed to return a string array.
A private static method of the internal class "System.Net.HeaderInfoTable.ParseMultiValue(string value)".
And the private static method "System.Web.HttpRequest.ParseMultivalueHeader(string s)".
Both have similar code, but the "qvalue" isn't taken into account.
It's the second method that does the parsing for the "Request.UserLanguages", so no ordering based on preference is undertaken.
The only place I found this working as expected was in "System.Web.Caching.OutputCacheModule".
So no, there doesn't seem to be inbuilt functionality that we can touch for the "Accept-Language" header either.
Theoretically you are right, but
can you give an example of any browser that will use the request "Accept-Encoding" with the value
"gzip;q=0,deflate" or "*" ?
Dave, but isn't the accept-language already parsed in some way by a .NET class already?
I was only searching for an explanation of the encoding quality values, but ended up reading the whole post. Excellent! Clear explanation of stuff most blogs just skip over.