DataTable is an application object that maps to a Database Table. It holds a collection of DataRows and DataColumns where DataRow represents the actual Data while the DataColumns holds Integration rules. Basically we all know the basics of DataTable and often use it for our regular work.
DataSet holds a list of DataTable, and DataRelation. DataTable as being said is the storage of Data in form of a Table with Row and Columns, the DataRelation holds the Relationship rules between two tables in a DataSet. Hence ADO.NET has been structured in such a way that you can have the flavour of entire database from your application end.
But, one thing that you must keep in mind, these objects are created way back in time of .NET 1.0 release or even before when there is no Generics or even the concept of State Machines. Generics allows you to pass type of an object during actual object creation and hence will allow you to create Type information based on the object which we create in real time, as a result it eliminates unnecessary runtime boxing and unboxing. Keeping this in mind, I thought I should check how these classes are made in Framework. Lets look each of them, one by one in this post and lets identify pros and cons.
DataSet is the application object that can hold the entire structure of the database schema. Basically, internally it is a serializable object which holds two set of collection. DataTableCollection and DataRelationCollection where the former is an implementation of an ArrayList of DataTables and the later is a list of DataRelation. DataSet can be Cloned (Produces another object with New tables from existing schema) or Copied (Produces another object with same tables)
Some of the important benefits of DataSet
- Easily Serializable, even could be written to XML File using WriteXML or can be read to the Object.
- Allows you to Merge other DataSet into it easily.
- Can hold Relation and Tables into it.
Problems & best implementation of DataSet
- Can only hold a collection of DataTable, and it does not allow us to define our own data structures and produce custom DataRelation among them.
- No Generic support, use of ArrayList means each element has to be upcast to object when being stored and again to actual element (DataTable) when being retrieved.
DataTable is the primary object which holds the Data. It holds a collection of DataColumn and DataRow. Each of the DataRow holds the data element while DataColumn holds the rules between each columns. The DataRowCollection internally represents an ArrayList of DataRow while DataColumnCollection is on DataColumns.
To hold the DataRows, a DataTable maintains an internal sealed class called RecordManager. Everytime you create a DataRow element inside a DataTable class, the object is stored into RecordManager. The DataRowCollection holds a list of DataRow. Here RBTree class implemented as Generic list of DataRow objects is used.
Some important benefits of DataTable
- DataTable can store any Table even if the data is complex.
- Serializable, so can be ported in communication medium easily.
- Keeps track of indexed field.
Problems of DataTable
- Most of the objects are non-generic, and hence leading to conversion of types.
- Very bulky.
DataRow on the other hand holds the actual data. It has an indexer to the object which lets you get or set data to it.
Now let us take a look at how the data being stored into these data structures and later retrieved from it. Say you have opened a connection to database and used Adapter.Fill to fill data into a DataTable. Now lets examine what internally happens to store Data into the DataTable and also retrieving the same when required. Say the code I write to retrieve the Data is as below :
SqlDataAdapter adpt = new SqlDataAdapter("Select * from Table", conn);
Now when you call Fill method, it actually invokes the external SQL server process to load the Data into memory and parses the loaded memory into the existing data structure. If you have already read my first part of this article , you should be clear how the DataRows are created using DataReader and put into DataRowCollection. In this post I will cover from there, how the interal DataRow is parsed.
The DataReader loads each DataRow into memory and takes you through with calling LoadDataRow method of SchemaMapping. If you see into Reflector the code for LoadDataRow looks like :
Here the initial call to dataReader.GetValues is actually getting the data from memory into DataRow, which is then Loaded into DataTable and finally to the DataSet. Now if you try to look into the defination of GetValues you need to look into the actual implementation of it in the provider specific classes. For SqlClient, the data is loaded using SqlBuffer class. The Buffer reads each individual entries and puts it as an array. So basically for a DataRow object loading Data for Sql Server, it puts the actual Data into an array of SqlBuffer. So if you can see the implementation of SqlBuffer, you would be clear about the actual data storage of DataRow.
So if you look into the implementation of SqlDataReader, you can see it basically maintains the collection of SqlBuffer. SqlBuffer is the actual unit of data storage. It maintains a structure Storage in it which has member to store almost all data basic data types The Storage stores the actual data in Type and the SqlBuffer holds the exact type information.
Here inside the SqlBuffer you can see the member _value as Storage. StorageType on the other hand is an Enum which sets the actual type of Data associated in the DataRow. The SqlBuffer also exposes properties, each of which can get or set the value for the object. The exact type information is also held into DataColumnCollection and according to that the Type of the Data is parsed.
Hence, this means the DataRow is actually stored internally into proper types and hence does not have any performance implication, right? No, there is.
Actually the performance implication lies above these fact when you try to retrieve the value or store it into the DataRow object. We generally use indexer to fetch data from the DataRow. The indexer to the DataRow is actually of type object.
public object this
This is the major problem to DataRow class. So eventually, every time you try to retrieve data from DataRow, even though proper type is maintained underneath, the indexer box to object type. Also, upon using setter, the proper type maintained from our code is also being boxed to the setter and which again unboxed to proper data type.
I think these classes could be generalized more to induce better performance to the API. I am also hoping to have an enumeration of generic objects in the whole data structure of DataRow and DataColumns (may be using DataCell<t>) in future.
I hope you like the artcile.I don't know if I did made you things clear? I have to put only those which are important to know. I would suggest to use Reflector to see the implementation in detail to know further.