An algorithm is a finite sequence of actions or steps aimed at solving a specific problem or task.
For example, an algorithm for solving the problem of finding factorials might look something like this:
Task: Determine the factorial of the number n.
- Initialize the variable fact = 1
- For each value of the variable i in the range from 1 to n:
- Multiply fact by i
- fact contains the factorial n
Now the algorithm is written in text form. If it were written in a programming language, we would call it code .
Let’s write the code to find the factorial of a number in C++:
We have another article on …
Programming is about data structures and algorithms. Data structures are used to store data and algorithms are used to solve specific problems using that data.
Data Structures and Algorithms (abbr. DSA , from the English. Data Structures and Algorithms ) examine in detail solutions to standard problems and give an idea of how effectively to use each of them. They also provide an opportunity to evaluate the effectiveness of the algorithm. This allows you to choose the most optimal option from all possible.
Using data structures and algorithms to create scalable code
Time is a precious resource
Let’s assume that Vika and Kolya are trying to solve a simple problem of finding the sum of the first 10 11 natural numbers. While Kolya was writing the algorithm, Vika had already implemented it, proving that it was as simple as two and two making four…
Algorithm (Koli)
- Initialize sum = 0
- For each natural number n in the range from 1 to 1011 (inclusive):
- Adding n to sum
- sum is the answer
Code (Wiki)
- int findSum() {
- int sum = 0;
- for (int i = 1; i <= 100000000000; i++) {
- sum += i;
- }
- return sum;
- }
Vika and Kolya feel happy that they were able to create something of their own in a very short time. Let’s sneak into their workplace and eavesdrop on their conversation.
Vika: Let’s run this code and find out the amount.
Kolya: I ran this code a few minutes ago, but it still needs to show the result. What’s wrong with him?
Oops, something went wrong! The computer is one of the most predetermined systems. Simply running the code again will not help. So let’s analyze what’s wrong with this simple code.
Time and memory are some of the most valuable resources for a computer program.
Time taken by the computer to execute the code:
- Code execution time = number of instructions * execution time of each instruction
The number of instructions depends on the code you are using, and the execution time of each instruction depends on your computer system and compiler.
In this case, the total number of instructions (say x
) executed is , which is .x = 1 + (1011 + 1) + (1011) + 1
x = 2 * 1011 + 3
Let’s say the computer can execute instructions in one second (this may vary depending on your system configuration). The execution time of the above code is:y = 108
Execution time of y instructions = 1 second
Execution time of 1 instruction = 1 / y seconds
Execution time x instructions = x * (1/y) seconds = x/ y seconds
Therefore, .code execution time = x / y= (2 * 1011 + 3) / 108 ( more than 33 minutes)
Is it possible to optimize the algorithm so that Vika and Kolya do not wait 33 minutes every time they run this code? I’m sure you’ve already guessed the correct method. The sum of the first N
natural numbers is expressed by the formula:
Sum = N * (N + 1) / 2
Converting this formula into code would look like this:
int sum(int N) {
return N * (N + 1) / 2;
}
This code runs in just one instruction and solves the problem regardless of the value, even if it is greater than the total number of atoms in the universe. The result will be found instantly.
The time taken to solve this problem in this case is 1/y
(which is equal to 10 nanoseconds). By the way, the hydrogen bomb fusion reaction takes 40-50 ns, which means your program will complete successfully even if someone drops a hydrogen bomb on your computer while you are running your code.
┃Note: Computers require multiple instructions (not just one) to perform multiplication and division operations. I only mentioned 1 for simplicity.
More about scalability
Scalability is scale plus capability, which means the quality of the algorithm/system for handling larger problems.
Consider the problem of arranging a classroom for 50 students. One of the simplest solutions is to book a room, set up a board, a few crayons and the problem is solved.
But what if the size of the problem increases? What if the number of students increases to 200?
The solution still exists, but it will require more resources to implement. In this case, you may need a much larger room (such as a theater), a projector screen, and a digital pen.
What if the number of students increases to 1000?
The solution no longer works or uses too many resources when the size of the problem increases. This means that the solution is not scalable.
What is a scalable solution?
If you look at our original solution for finding the sum of the first Nnatural numbers, it was not scalable. This is due to the fact that it required a linear increase in time while the size of the problem grew linearly. Such algorithms are also known as linearly scalable algorithms.
Our second solution was highly scalable and did not require additional time to solve a larger problem. These algorithms are known as constant time algorithms.
Memory is an expensive resource
Memory is not always available in abundance. When working with code/systems where large amounts of data need to be stored or generated, it is critical that your algorithm conserves memory usage wherever possible. For example, when storing data about people, you can save memory by storing only their date of birth rather than their age. You can always calculate age on the fly using your date of birth and the current date.
Examples of algorithm efficiency
Examples of what learning algorithms and data structures can do:
Example #1: Age group search problem
Problems such as finding people of a certain age group can easily be solved with a slightly modified version of the binary search algorithm (assuming the data is sorted).
The simplest algorithm, which iterates through all people one at a time and checks whether they fall into a given age group, is linearly scalable. While binary search characterizes itself as a logarithmically scaling algorithm. This means that if the size of the problem increases by the square of the problem, the time it takes to solve it only doubles.
Let’s assume that it takes 1 second to search for people of a certain age group for a group of 1000 people. Then for a group of 1 million people:
→ the binary search algorithm will take only 2 seconds to solve the problem;
→ a naive algorithm might take 1 million seconds, which is about 12 days.
The same binary search algorithm is used to find the square root of a number.
Example #2: Rubik’s Cube Problem
Imagine that you are writing a program to solve a Rubik’s cube problem.
This cute puzzle has a staggeringly large number of positions – 43,252,003,274,489,856,000, and that’s just the positions! Imagine the number of paths that can be taken on the way to incorrect positions.
Fortunately, a way to solve this problem can be represented in the form of a graph . There is a graph algorithm known as Dijkstra’s algorithm that can solve this problem in linear time. Yes, you heard correctly. This means that it allows you to get to the decided position in a minimum number of moves.
Example #3: DNA Challenge
DNA is a molecule that carries genetic information. It consists of smaller units, which are designated by the Roman symbols A, C, T and G.
Imagine you are working in bioinformatics. You are tasked with finding the occurrence of a specific pattern in a DNA chain
This is a known problem in the academic computer science community. And the simplest algorithm requires proportional time (number of characters in the DNA chain) * (number of characters in the template)
A typical DNA strand contains millions of these units. But don’t worry ahead of time, the KMP algorithm (abbreviated from the English “ K nuth– Morris – Pratt ” ) can complete this task in a time proportional to(number of characters in the DNA chain) + (number of characters in the template).
The operator *
replaced by the operator +
leads to significant changes.
Given that the pattern was 100 characters long, your algorithm is 100 times faster. If your template were 1000 characters long, the KMP algorithm would be almost 1000 times faster. That is, if you could find a pattern occurrence in 1 second, now you only need 1 ms. You can say that instead of searching in 1 chain, you can simultaneously search for matches in 1000 chains of similar length.
And there are countless such stories…
Conclusion
In general, software development involves learning new technologies every day. If you are new to algorithms, you will not be able to optimize the code you write or use on your project.
We specifically talked about the scalability of algorithms. A software system consists of many such algorithms. Optimizing any of them leads to a reduction in its computational complexity.
However, it is important to note that this is not the only way to make a system scalable. For example, a technique known as distributed computing allows independent parts of a program to run on multiple computers together, making it even more scalable.