Data Structure and Algorithms (DSA) Tutorial
There is no denying that the computer is one of the greatest inventions of all time. In fact, it is now quite difficult to imagine a world without computers or, to be more precise, the uses and benefits that computers offer. To say that computers have made lives easier would be an understatement. In fact, we can readily rattle off several areas where computers play a major role in solving problems or providing solutions to make tasks easier and actually doable.
However, there are those who would refute the benefits of using a computer, stating that, instead of making their lives easier, it actually added more complexity to things, resulting in errors, mistakes, and unsatisfactory output. Worse, these shortcomings is likely to have an impact on major decisions
Certainly, one of the core functionalities (and benefits) of computers is in data management, from collecting raw data and processing them into information, to their storage and even up to the point where they will have to be retrieved to be used for one purpose or another.
It is to this end that much focus has to be placed on the organization of data for easier retrieval when they are required. In order to ensure that, organizations and businesses are careful about choosing which computer program to use. A program is essentially a sequence of instructions – written in computer language – in carrying out a task. The creation of these instructions and their subsequent input into the computer in order to completely become a program is referred to as “programming”.
In this discussion, we will be taking a closer look at the two components of a good computer program: data structures and algorithms, also known as DSA. These two must be present in a program to ensure that it does what it supposed, or designed, to do.
The first of the two components of a program is a data structure, which is a specialized and organized “collection and arrangement of data (or data elements) and its subsequent storage in a computer’s memory, in such a way that it can be easily accessed and retrieved when needed.” Since a data structure refers to how data is organized, it also shows the logical relationship between and among the elements of data.
Why is there a need for a data structure?
Not only does a data structure help ensure the effectiveness of a program, but it also improves its efficiencies. As a result, the value of the data will increase, and decision-makers will be able to come up with the right decisions and strategies for implementation.
It is simple, really. With a data structure, data is better organized or arranged, resulting in an efficient program that will operate and run smoothly, even when using complex applications and dealing with functions such calculations. Ultimately, a data structure will have an impact on the decisions made by top management or some other decision-maker.
In order for a data structure to be efficient and effective, three requirements must be met.
- Sufficient disk space for storage. Since a data structure involves the organization and storage of data, there is a need for companies to ensure that there is enough storage or disk space to accommodate all the data items.
- Sufficient operating time. Just because computers are involved does not mean that everything can be accomplished in a split-second. There will still be a need for enough time for the data structure to perform even the most basic operation.
- Programming skill and effort. This is the ‘human’ element. The one who wrote the program (programmer) must have the technical know-how and skill when it comes to all things related to data structure.
Classifications of Data Structures
There are two general classifications of data structures, with subcategories and types within each category.
A. Primitive Data Structures
Also called “primary data structure”, a primitive data structure is one that is created from scratch, so to speak, without using other data structures as support or tool. This is the most basic and simple data structure, which is designed to operate upon by machine-level instructions.
The inflexible nature of this built-in data structure is quite limiting, regardless of its simplicity. Users have varying data needs, which often call for high user involvement in terms of the configuration of the data structure. That is the reason why, between the two major classifications of data structures, the primitive structure is the lesser chosen one.
Briefly, the known types of primitive data structures include the Integer, Floating Point, Double, Character, Enumerated Type, and the “true or false” Boolean structure.
B. Abstract Data Structures
When the users get to use a set of operations in order to define a data type, we are referring to the Abstract Data Structure. Others refer to it as “non-primitive data structure” and “secondary data structure”, a name adapted in reference to the fact that its creation is dependent on a primary data structure.
Compared to primitive data structures, these are considerably more complex, since they put emphasis on the relationships between and among data items in a group. Obviously, between the primitive data structure and the abstract data structure, the latter receives higher preference when dealing with large and connected data.
Further, there are two subcategories under this classification.
Static or Fixed data structures
This type of structure is formed when it is already known, from the outset, how many data items will be contained within it. Thanks to this advanced knowledge, static memory allocation may be used to create the data structure, and the size of the structure may be determined early on, so it can be fixed.
This type of structure is ideal when dealing with a definite or precise number of data items to be stored, making it easier to add or remove data items accordingly, depending on the available space within the memory allocation.
The simplicity of this type of structure is one of its main draws. Since the size of the data structure is already fixed, it is not something that must be constantly checked on or monitored by programmers.
The main argument against it is that there is a possibility that the memory may not be used efficiently, since there is a chance that the focus will be on how to use or fill up the available memory, and not on the relevance or quality of the data being fed into the structure, just to make sure the memory allocation is utilized.
Dynamic or Variable Size data structures
In contrast, the number of data items may be unknown or uncertain at the beginning. This requires a certain degree of flexibility, which is why creation of the data structure is facilitated by the use of dynamic memory allocation. It is dynamic or “variable-size” in the sense that, during execution, the data structure size may grow or shrink as needed, in order to accommodate the data to be stored.
When it comes to memory use, dynamic data structures are much more efficient than their static counterparts. After all, memory will only be used as much as it is needed, with no pressure on anyone to fill it up to maintain a specific fixed size.
It is possible for the structure to overflow or underflow. If there is too much data fed into the structure and it goes beyond the allocated memory limitation, an overflow takes place. In contrast, an underflow increases risks of the structure being empty and unutilized.
Thus, it is important to keep a close eye on the structure size to avoid overflow or underflow. This is another point against dynamic structures, since it entails considerable amount of (complex) programming.
Linear data structures
As the name implies, these structures clearly demonstrate adjacent elements having a direct relationship.
A classic example of a linear data structure is a linked list, where one link (called a node) in the list is directly related to the node next to it, or even on the opposite side, on a one-is-to-one basis. This reference or relationship continues in the same manner until the last node, which will have reference to null.
Here is a great explanation of linked lists you can find in your mobile phone’s contact list.
Non-linear data structures
This is the exact opposite of the linear structure, with adjacent data items not having any linear or direct link. Instead of one data item connecting to another single data item, it is possible for one data item to be related to multiple data items at one time.
A typical representation of this is a tree, which is designed to point to a single or multiple nodes at one time.
Types of Abstract Data Structure
There are a number of data structure types that most of you may be familiar with already, but there may also be one or two that you’ve ever heard of before. Let us take a look at the types of data structure that are used by various entities in organizing data.
An array is often said to be the most basic data structure type, and often categorized as a primary data structure. In this structure, multiple homogeneous valuesor items (or data bits) that may or may not be similar but share a common label, are organized and coordinated into a group.
This basic data structure type involves a record, which is basically a set of fields, with each field consisting of data of certain types, which may or may not be the same. Often, the fields are also presented in a fixed sequence.
An example of a record is that of an employee of an organization, with fields pertaining to his name, monthly rate of salary, and rank or designation. Unlike arrays, the number of fields within a record is generally fixed.
A list is a data set with the data items or elements ordered depending on the specific application or purpose of the data structure. Its main difference with an array is how insertion and deletion of nodes is allowed in lists, but not in arrays.
A list may be linear or non-linear, in the similar vein as the explanation on the linear and non-linear forms of dynamic data structure.
Linear List: This form has the data elements or nodes arranged in accordance with their logical order on the list, creating a sequence. Since the physical order is the same as the logical order, it allows direct indexing, easy navigation and implementation. The disadvantage is that it is inflexible at certain points, resulting to inefficiencies. Adding or deleting a node will have a significant impact, either by shifting existing nodes down and even out of the list with the insertion of a new node, or by moving existing nodes up or ‘forward’ to take the place of the deleted node.
- Stacks: In this type of data structure, when a node is added, it is inserted or “stacked” on the top of the stack, while another node is deleted, also from the same end of the stack, thereby maintaining the fixed size of the data structure. This demonstrates the Last In, First Out (LIFO) concept. Basically, the nodes or elements contained in an array must be of the same data type. Implementation of stacks is through the Push and Pop:Push, when a node is inserted or added into the stack, and Pop, when the most recently pushed node is deleted or extracted from the top of the stack.
- Queues: This linear list is somewhat similar to stacks; however, it adheres to the First In, First Out (FIFO) concept, where the insertions or additions are made at one end (the back of the queue), while the nodes at the other end (the front of the queue) are removed or deleted.
Non-linear list: In a non-linear list, there is no clear logical order or sequence in the physical arrangement of the nodes. As such, it offers high flexibility, allowing the addition of a new node or value anywhere on the list and the memory.
- Trees: A tree, or a linked tree, is essentially a group of nodes representing a value as parent nodes, with references to other nodes, representing their children. Trees are hierarchical – much like how an actual tree looks like – since it comes complete with a root value, and branches or subtrees represented as nodes linked to the parent node. It is also capable of pointing out multiple nodes all at once. Through that setup, trees enable high-speed searching of data and sorting them accordingly. It is also effective in eliminating duplication of nodes or data items.
- Graphs: When it comes to graph in data structure, we are referring to mathematical graphs. If you take a look at the graph, you will notice points where lines meet, and these are called the vertices or the nodes. The lines that link these nodes to each other are the edges. Graphs and Trees are both non-linear, and there are certain similarities. One of the most glaring differences, however, is the absence of any hierarchical relationship between and among the nodes.
Here is a good explanation of tree-like data structures.
How To Select A Data Structure
Now we come to a very important question. After going through the various types of data structures in this article (as well as other data structure types not discussed here), you may be wondering how, then, should you choose or pick a data structure suitable for you.
Before you can select the data structure that best fits your requirements and is expected to bring results, there are a couple of steps that you should first perform.
- Define the problem. What are you hoping to accomplish or provide a solution for using data structure and algorithms?
- Determine resource availability. Pay extra attention on any resource constraint that you must meet.
- Determine the basic operations that requires support of a data structure. It would be a good idea to make estimates so as to avoid any problems later on.
Your objective, the availability of resources, and the determination of operations to be performed will help in deciding which data structure to use.
Say, for example, that you have a problem you want to solve, and the solution entails a process, with a series of steps, instead of a single, decisive solving action. This step-by-step technique or “road map” describes an “algorithm”.
Here is another comprehensive definition provided by author Thomas Cormen and company: an algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values as output.”
Aside from data structure, another tool that is very important for programmers is algorithms. Let us try to take a look at the advantages or benefits of using algorithms.
- The use of algorithms, especially in combination with data structures, will aid in the management and handling of huge amounts of data, even if there are several large databases involved.
- Algorithms offer flexibility in how it is made to be implementable in more than one programming language, making its applicability more widespread.
- Knowledge of algorithms, in combination with data structures, enables a programmer to write the proper codes quickly and efficiently.
- Choosing the right algorithms (and data structure) will ensure the program runs fast while also maintaining integrity and reliability of data.
In consonance with data structure, there are several algorithm categories, and here are five of them.
- Search Algorithm – This algorithm is used to search for a data item within a data structure, allowing the person doing the search to input the appropriate search terms or keywords.
- Sort Algorithm – Sorting, collating and organizing items in an orderly manner can be quite a pain, and that is what this algorithm attempts to address.
- Insert Algorithm – When trying to add a data item online in a data structure via insertion, this algorithm will do the job.
- Update Algorithm – From time to time, updates to existing data items must be made, and that’s where this algorithm fits right in.
- Delete Algorithm – If there is an algorithm to add or insert new items in the data structure, then there is definitely going to be one purposely for deleting or removing an existing data item.
Here is an interesting lecture on algorithms.
Characteristics of an Algorithm
Just as an algorithm is a step-by-step procedure, so is the manner with which it is written. Although programmers have different unique approaches in how they go about their coding projects, the principles remain the same.
There is one thing we have to make clear, however, and that is on how writing algorithms does not follow a certain or specific set of standards or guidelines.
When writing algorithms, make sure that the following characteristics are present.
- An algorithm must have 0 or more inputs and 1 or more outputs that are well-defined and exact, so as to avoid confusion.
- An algorithm must be finite, meaning that, after a specific and defined number of steps in the process, it will cease or terminate.
- An algorithm must be clear and easy to understand, to avoid any ambiguity that will lead to more confusion. It should be straightforward, leaving no doubt as to what direction it aims to take.
- An algorithm must be feasible, particularly with respect to the availability of resources. Many excellent algorithms never saw the light of day in execution or implementation for the simple reason that there weren’t enough resources to actually carry it through.
Measurement Tools for Algorithms
Algorithm is evaluated, more often than not, in terms of efficiency and performance. Analysis of algorithm may be done in two ways: a priori or a posterior.
- A Priori Analysis. This is analysis performed before the implementation, which makes its theoretical in nature. After all, you are essentially analyzing something that has not yet come to pass. This analysis is performed under the assumption that speed of system hardware and software, technical know-how of programmers, and all other factors are constant. Therefore, they will not have any effect once implementation of the algorithm and data structure takes place.
- A Posterior Analysis. After implementation, analysis may be performed on the selected algorithm. This is the more realistic or empirical analysis since it is performed with a strong basis on actual statistics obtained during implementation or execution. Clearly, between the two, the results of the a posterior analysis is more likely to be more reliable and valid.
When assessing the performance of an algorithm, these two properties are taken into consideration.
- Time complexity represents the total length of time or duration needed by a program to complete its operation.
- Space complexity refers to the memory space that the algorithm will require during execution and throughout its life cycle, with a view to maximize memory space.
There are several metrics used in measuring or judging algorithm, and the most commonly used are:
- Runtime, or the total time it took for the algorithm to run. This is highly preferred due to its ease of quantification, which subsequently allows analysts to make comparisons better and faster.
- Ease of programming, taking note of issues such as the presence of bugs and other similar problems.
- Length of the program, signified by the number of lines of code that the programmer has to write.
Data Structure and Algorithms, in general, can be quite a daunting topic to learn about, especially for those with zero background on programming. What we talked about are just the basics and the theoretical aspect.