Working with Strings with Combining Characters

No.of Views981
Bookmarked0 times
Downloads 
Votes0
By  Geming Leader   On  29 May 2010 05:05:24
Tag : CSharp , Miscellaneous
In some languages, like Arabic and Hebrew, you combine some characters with combining characters based on the pronunciation of the word. Combining characters are characters (like diacritics, etc.) that are combined with base characters to change the pronunciation of the word (sometimes called vocalization.)
emailbookmarkadd commentsprint

Images in this article missing? We recently lost them in a site migration. We're working to restore these as you read this. Should you need an image in an emergency, please contact us at info@codegain.com

 

This article is also available in my blog, Just Like a Magic.

هذه المقالة متوفرة أيضا باللغة العربية، اقرأها هنا.

Contents

Contents of this article:

  • Contents
  • Introduction
  • Writing Arabic Diacritics
  • Using the Character Map Application
  • Enumerating a String with Only Base Characters
  • Enumerating a String with Combining Characters
  • Comparing Strings
  • Try it out!

Introduction

In some languages, like Arabic and Hebrew, you combine some characters with combining characters based on the pronunciation of the word.

Combining characters are characters (like diacritics, etc.) that are combined with base characters to change the pronunciation of the word (sometimes called vocalization.)

Some examples of combining characters are diacritics:

 Base CharacterCombining Character(s)Result
1 Combining a single character

Arabic Letter Teh Arabic Letter Teh 0x062A

Arabic Damma Arabic Damma 0x064FArabic Letter Teh + Damma.gif Letter Teh + Damma
2 Combining two charactersArabic Letter Teh Arabic Letter Teh 0x062A

Arabic Shadda Arabic Shadda 0x0651

Arabic Fathatan Arabic Fathatan 0x064B

Arabic Letter Teh + Shadda + Fathatan Letter Teh + Shadda + Fathatan

When you combine a character with another one then you end up with two characters. When you combine two characters with a base one you end up with 3 characters combined in one, and so on.

Writing Arabic diacritics

The following table summarizes up the Arabic diacritics and the keyboard shortcut for each character:

Unicode RepresentationCharacterNameShortcut
0x064BArabic FathatanFathatanShift + W
0x064CArabic DammatanDammatanShift + R
0x064DArabic KasratanKasratanShift + S
0x064EArabic FathaFathaShift + Q
0x064FArabic DammaDammaShift + E
0x0650Arabic KasraKasraShift + A
0x0651Arabic ShaddaShaddaShift + ~
0x0652Arabic SukunSukunShift + X

Using the Character Map Application

Microsoft Windows comes with an application that help you browsing the characters that a font supports. This application is called, Character Map.

You can access this application by typing charmap.exe into Run, or pressing Start->Programs->Accessories->System Tools->Character Map.

 

Enumerating a String with Base Characters

Now we are going to try an example. This example uses a simple word,Word Muhammad (Mohammad; the name of the Islam prophet.)

Word Muhammad Details

This word (with the diacritics) is consisted of 9 characters, sequentially as following:

  1. Meem
  2. Damma (a combining character combined with the previous Meem)
  3. Kashida
  4. Hah
  5. Meem
  6. Shadda (a combining character)
  7. Fatha (a combining character both Shadda and Fatha are combined with the Meem)
  8. Kashida
  9. Dal

After characters combined with their bases we end up with 6 characters, sequentially as following:

  1. Meem (have a Damma above)
  2. Kashida
  3. Hah
  4. Meem (have a Shadda and a Fatha above)
  5. Kashida
  6. Dal

The following code simply enumerates the string and displays a message box with each character along with its index:

// C#

string name = "مُـحمَّـد"
string result = String.Empty;

for (int i = 0; i < name.Length; i++)
    result += String.Format("{0}\t{1}\b", i, name(i));

MessageBox.Show(result);
' VB.NET

Dim name As String = "مُـحمَّـد"
Dim result As String = String.Empty

For i As Integer = 0 To name.Length - 1
    result &= String.Format("{0}{1}{2}{3}", i, vbTab, name(i), vbNewLine)
Next

MessageBox.Show(result)

What we get? When enumerating the string, we enumerate its base characters only.

Enumerating a String with Combining Characters

.NET Framework provides a way for enumerating strings with combining characters, it is via the TextElementEnumerator and StringInfo types (both reside in namespace System.Globalization.)

The following code demonstrates how you can enumerate a string along with its combining characters:

// C#

string name = "مُـحمَّـد";
string result = String.Empty;

TextElementEnumerator enumerator =
    StringInfo.GetTextElementEnumerator(name);

while (enumerator.MoveNext())
    result += String.Format("{0}\t{1}\b",
enumerator.ElementIndex, enumerator.Current);

MessageBox.Show(result);
' VB.NET

Dim name As String = "مُـحمَّـد"
Dim result As String = String.Empty

Dim enumerator As TextElementEnumerator = _
StringInfo.GetTextElementEnumerator(name)

While enumerator.MoveNext()
    result &= String.Format("{0}{1}{2}{3}", enumerator.ElementIndex, vbTab, _
        enumerator.Current, vbNewLine)
End While

MessageBox.Show(result)

Comparing Strings

Sometimes, you will be faced with a situation where you need to compare two identical strings differ only by their diacritics (combining characters) for instance. If you were to compare them using the common way (using String.Compare for instance) they would be different because of the combining characters.

To overcome this you will need to use a special overload of String.Compare method:

The Kashida, isn't of the Arabic alphabets. It's most likely be a space! So the option CompareOptions.IgnoreSymbols ignores it from comparison.

// C#

string name1 = "محمد";
string name2 = "مُـحمَّـد";

// 1st check
if (name1 == name2)
    MessageBox.Show("Strings are identical");
else
    MessageBox.Show("Strings are different!");

// 2nd check
if (String.Compare(name1, name2) == 0)
    MessageBox.Show("Strings are identical");
else
    MessageBox.Show("Strings are different!");

// 3rd
if (String.Compare(name1, name2,
System.Threading.Thread.CurrentThread.CurrentCulture,
        CompareOptions.IgnoreSymbols) == 0)
    MessageBox.Show("Strings are identical");
else
    MessageBox.Show("Strings are different!");
' VB.NET

Dim name1 As String = "محمد"
Dim name2 As String = "مُـحمَّـد"

' 1st check
If (name1 = name2) Then
    MessageBox.Show("Strings are identical")
Else
    MessageBox.Show("Strings are different!")
End If

' 2nd check
If (String.Compare(name1, name2) = 0) Then
    MessageBox.Show("Strings are identical")
Else
    MessageBox.Show("Strings are different!")
End If

' 3rd check
If (String.Compare(name1, name2, _
        System.Threading.Thread.CurrentThread.CurrentCulture, _
        CompareOptions.IgnoreSymbols) = 0) Then
    MessageBox.Show("Strings are identical")
Else
    MessageBox.Show("Strings are different!")
End If
 
Sign Up to vote for this article
 
About Author
 
Geming Leader
Occupation-Software Engineer
Company-Just Like a Magic
Member Type-Expert
Location-Egypt
Joined date-30 Jul 2009
Home Page-http://WithDotNet.net
Blog Page-http://JustLikeAMagic.com
Independent software developer, trainer, and technical writer from Egypt born in 1991
 
 
Other popularSectionarticles
Comments
There is no comments for this articles.
Leave a Reply
Title:
Display Name:
Email:
(not display in page for the security purphase)
Website:
Message:
Please refresh your screen using Ctrl+F5
If you can't read this number refresh your screen
Please input the anti-spam code that you can read in the image.
^ Scroll to Top