There are many different programming languages, frameworks, tools, and technologies that appear and disappear in software engineering field. But the fundamentals of computer science remain unchanged – data structures. Not only data structures represent a great method for storing information but they also are some special algorithms providing efficient management of data.
Data structures play crucial role in modern software engineering in case developers work with cloud scalability, real-time data analysis, and artificial intelligence. The choice of proper data structures is the key factor that guarantees smooth operation of applications in the production environment. Every software engineer needs to learn these important data structures to build efficient software products, pass technical interviews, and design complex systems.
1. Arrays and Strings:
The Basics of Linear Data StructureArrays are among the most basic linear data structures in computer science. An array includes a series of elements of identical data type allocated in consecutive memory cells. The linear allocation of arrays makes them really convenient for reading and writing operations. An access to an element takes constant $O(1)$ time if you know its index in advance. The computation of the address of the corresponding memory cell is performed via an arithmetic equation.
The Cost
However, arrays require much time for performing any actions related to the manipulation of their contents. Insertion and deletion of elements in the beginning or middle of an array involve relocation of all subsequent elements. Thus, the cost of such actions is linear $O(n)$ time. In addition, arrays are static structures and their sizes should be defined in advance. They may waste a lot of memory or even lead to buffer overflow when working with data dynamically.
2. Linked Lists:
The Dynamics of Data AllocationLinked lists are similar to arrays in the way they store data but they do not require contiguous memory locations. Each element of a linked list is represented as a node with two fields: the element data and a pointer to the next element. There are two types of lists: Singly Linked Lists (forward-only pointers) and Doubly Linked Lists (both forward and backward pointers).
The major advantage of linked lists is their dynamic nature that allows allocating memory only when the new element appears. The size of a list varies in response to the allocation and deallocation of memory during the runtime.
Comparison with Arrays in Terms of Performance
It takes constant $O(1)$ time to insert or delete an element if the pointer to the target node is known. It requires only changing the pointers of adjacent nodes. Nevertheless, unlike arrays, linked lists provide no ability to perform random access to elements. Searching for the n-th element requires sequential search starting from the head of the list and takes $O(n)$ time. In addition, they consume extra memory due to additional pointers.
3. Hash Tables:
Achieving $O(1)$ Time in AccessHash tables (dictionaries, maps, and associative arrays) represent one of the most powerful data structures designed by software engineers. A hash table works as follows: a hash function computes a unique index for a certain key. Then, the hash function allocates a memory cell where the key-value pair will be stored.
Mechanism of Collision Resolution
Of course, it might happen that different keys produce identical indices. Thus, a developer should know how programming languages resolve the issue of collisions:
Chaining: The hash table uses an array of linked lists. So, when the collision happens, the key-value pair is appended to the list of the bucket that corresponds to the index.
Open Addressing (Linear Probing): In case the hash function produces an index that already contains some data, the hash table searches for the nearest available slot and writes the data there.
4. Stacks and Queues:
Linear Data Structures with Special RulesStacks and queues represent two linear data structures that impose certain restrictions on the insertion and deletion of elements. These data structures control the process of data access in a specific way.
Stacks: Last-In, First-Out (LIFO)
A stack allows access only to the latest inserted element (the top element). The LIFO rule is implemented via the use of two operations: push (adding a new element to the top) and pop (removing the top element). Stacks are widely used to control blocks execution in programming languages (Call Stack). In addition, stacks are applied for building Undo/Redo mechanisms and browser history feature in web browsers.
Queues: First-In, First-Out (FIFO)
A queue represents a virtual queue for elements where elements are added to the rear end and removed from the front end. The linear access to elements is provided via two operations: enqueue (adding a new element) and dequeue (removing the element from the front end). Queues are typically applied to implementing asynchronous processes, background CPU scheduling, rate limiting, and inter-process communication (RabbitMQ and Apache Kafka).
5. Trees:
Hierarchical Data Structures with Logarithmic GrowthTrees represent non-linear data structures that are applied in case the data has a hierarchical nature. Tree consists of nodes linked by edges in a parent-child relationship starting from the root node (the node without parents).
Binary Search Trees (BST) and Self-Balancing TreesAmong the most popular trees, there is a Binary Tree where each node has no more than two child nodes. BST expands the Binary Tree structure by introducing a rule that all nodes of the left subtree are less than the root node while all nodes of the right subtree are greater than the root node. Hence, BST allows effective data manipulation with logarithmic $O(\log n)$ time.
Frequently Asked Questions (FAQ)
1. What is the difference between logical and physical data structures?
Logical data structures (abstract data types or ADTs) describe the way data should be accessed from the development perspective without considering its physical location (stacks (LIFO) and queues (FIFO) can be realized either as arrays or linked lists). Physical data structures describe the arrangement of data in the computer memory (arrays allocate adjacent slots; linked lists randomly place elements using memory references).
2. Under what circumstances the worst-case performance of a hash table becomes $O(n)$?
From the theoretical point of view, the worst-case performance of a hash table degrades to linear $O(n)$ in case the hash function generates identical indices for all unique keys. In such a case, the hash table turns into a big linked list where the program finds the required element by means of sequential access. The current programming languages prevent the problem by means of cryptography and dynamic resizing of hash tables.
3. Under what circumstances should a developer choose a Balanced Binary Search Tree over a Hash Table?
Although hash tables allow searching for the required data faster in average ($O(1)$ vs $O(\log n)$), they are unordered data structures that cannot easily extract data in a sorted order or filter data within certain intervals (e. g., “Find all users aged from 21 to 35”). BST keeps the ordering of data and should be preferred in such cases.
4. Why are graphs considered to be non-linear data structures?
Unlike arrays, stacks, queues, and other linear data structures, elements of non-linear structures are not placed in a row and form a sequence of elements (each element has only one clear predecessor/successor except the absolute first and last elements). Graphs and trees allow establishing multiple relations with several elements at once.


