URL Encoding

No.of Views2733
Bookmarked0 times
Downloads 
Votes0
By  amalhashim   On  15 Feb 2010 23:02:11
Tag : CSharp , Miscellaneous
URL Encoding
emailbookmarkadd commentsprint

Images in this article missing? We recently lost them in a site migration. We're working to restore these as you read this. Should you need an image in an emergency, please contact us at info@codegain.com

 RFC 1738: Uniform Resource Locators (URL) specification

The specification for URLs (RFC 1738, Dec. '94) poses a problem, in that it limits the use of allowed characters in URLs to only a limited subset of the US-ASCII character set:

"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."

HTML, on the other hand, allows the entire range of the ISO-8859-1 (ISO-Latin) character set to be used in documents - and HTML4 expands the allowable range to include all of the Unicode character set as well. In the case of non-ISO-8859-1 characters (characters above FF hex/255 decimal in the Unicode set), they just can not be used in URLs, because there is no safe way to specify character set information in the URL content yet [RFC2396.]

URLs should be encoded everywhere in an HTML document that a URL is referenced to import an object (A, APPLET, AREA, BASE, BGSOUND, BODY, EMBED, FORM, FRAME, IFRAME, ILAYER, IMG, ISINDEX, INPUT, LAYER, LINK, OBJECT, SCRIPT, SOUND, TABLE, TD, TH, and TR elements.)

Characters that must be encoded includes the following

ASCII Control characters

These are ASCII non-printable character. Includes the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal.)

Non-ASCII characters

These are by definition not legal in URLs since they are not in the ASCII set. Includes the entire "top half" of the ISO-Latin set 80-FF hex (128-255 decimal.)

Reserved characters

URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded. Below table provides a snapshot of the reserved character.

CharacterCode
Points
(Hex)
Code
Points
(Dec)
Dollar ("$")
Ampersand ("&")
Plus ("+")
Comma (",")
Forward slash/Virgule ("/")
Colon (":")
Semi-colon (";")
Equals ("=")
Question mark ("?")
'At' symbol ("@")
24
26
2B
2C
2F
3A
3B
3D
3F
40
36
38
43
44
47
58
59
61
63
64

Unsafe characters

Some characters present the possibility of being misunderstood within URLs for various reasons. These characters should also always be encoded.

CharacterCode
Points
(Hex)
Code
Points
(Dec)
Why encode?
Space2032Significant sequences of spaces may be lost in some uses (especially multiple spaces)
Quotation marks
'Less Than' symbol ("<")
'Greater Than' symbol (">")
22
3C
3E
34
60
62
These characters are often used to delimit URLs in plain text.
'Pound' character ("#")2335This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins.
Percent character ("%")2537This is used to URL encode/escape other characters, so it should itself also be encoded.
Misc. characters:
Left Curly Brace ("{")
Right Curly Brace ("}")
Vertical Bar/Pipe ("|")
Backslash ("\")
Caret ("^")
Tilde ("~")
Left Square Bracket ("[")
Right Square Bracket ("]")
Grave Accent ("`")

7B
7D
7C
5C
5E
7E
5B
5D
60

123
125
124
92
94
126
91
93
96
Some systems can possibly modify these characters.


How are characters URL Encoded?

URL encoding of a character consists of a "%" symbol, followed by the two-digit hexadecimal representation (case-insensitive) of the ISO-Latin code point for the character.

Using C# How you can encode and decode URL's

.Net provide HttpUtility class for encoding and decoding URL's. You need to add System.Web reference to your project. Once you have added you can use the below code for encoding and decoding.

{codecitation class="brush: csharp; gutter: true;" width="500px"}

using System;
using System.Web;

namespace TestProject
{
class Program
{
static void Main(string[] args)
{
string encodedString = HttpUtility.UrlEncode("[hi this is a sample]");
Console.WriteLine("Encoded String : {0}", encodedString);

string decodedString = HttpUtility.UrlDecode(encodedString);
Console.WriteLine("Decoded String : {0}", decodedString);

Console.ReadLine();
}
}
}
{/codecitation}

Thank you
Amal

 
Sign Up to vote for this article
 
About Author
 
amalhashim
Occupation-Software Engineer
Company-Aditi Technologies
Member Type-Senior
Location-Not Provided
Joined date-07 Jun 2009
Home Page-http://lamahashim.blogspot.com
Blog Page-http://lamahashim.blogspot.com
I have done my masters in Computer Applications and graduation in Computer Science. I have great passion in working with Microsoft tool and technologies. I am also a Microsoft Most Valuable Professional. Personally my objective is to design/develop applications which eases user experience and performs better in long run.
 
 
Other popularSectionarticles
Comments
There is no comments for this articles.
Leave a Reply
Title:
Display Name:
Email:
(not display in page for the security purphase)
Website:
Message:
Please refresh your screen using Ctrl+F5
If you can't read this number refresh your screen
Please input the anti-spam code that you can read in the image.
^ Scroll to Top