Horizontal Fragmentation, min term predicates, primary horizontal fragmentation, distributed Database
Horizontal Fragmentation, Types and Min term predicates
In this tutorial, we will try to learn the followings;
- Horizontal Fragmentation
- Types of Horizontal Fragmentation
- Primary Horizontal Fragmentation
- Min term predicates
In database horizontal fragmentation, a relation (table) is divided into multiple sub-tables/sub-sets horizontally and by using the simple conditions.
For example, we have a relation (table) as Student(RollNo, Marks, University). Suppose the values for University attribute are ‘Stanford’, ‘California’, and ‘Harvard’, then the following SQL query can fragment the records (tuples) satisfying a simple condition.
SELECT * FROM Student WHERE University = ‘California’;
This query displays all the records related to the ‘California’ University, without any changes in the schema of the table. We can get three records if we change the University value in the WHERE clause of the above query, one for ‘California’, one for ‘Stanford’, and one for ‘Harvard’.
So this kind of horizontally fragmenting the whole table into multiple sub-tables without changing the table structure is called Horizontal Fragmentation. The concept is usually used to keep tuples (records) at the places where they are used the most, to minimize data transfer between far locations.
Types of Horizontal Fragmentation
- Primary Horizontal Fragmentation (PHF)
- Derived Horizontal Fragmentation (DHF)
Example of horizontal fragmentation of data for distributed database
Primary Horizontal Fragmentation (PHF)
Primary Horizontal Fragmentation is a table fragmentation technique in which we fragment a single table and this fragmentation is row-wise and using a set of simple conditions.
Note: Conditions are also called predicates.
Simple predicate
Given a table/relation R with set of attributes [A1, A2, A3, A4, …, An], a simple predicate Pi can be expressed as follows;
Pi : Aj θ Value
Where θ can be any of the symbols in the set {≤, ≥, ≠, <, >, =}, a value can be any value stored in the table for the attributed A i. For example, consider the following table Student given in Figure 1;
RollNo | Marks | University |
T01 | 33 | Harvard |
T03 | 77 | Stanford |
T04 | 23 | California |
T02 | 89 | California |
T05 | 90 | Harvard |
T06 | 90 | Harvard |
T07 | 15 | Stanford |
Figure 1: Student table
For the above table, we could define any simple predicates like University = ‘California’, University= ‘Harvard’, Marks < 77 etc using the above expression “Aj θ Value”.
Set of simple predicates
Set of simple predicates is set of all conditions collectively required to fragment a relation into subsets. For a table R, set of simple predicate can be defined as;
Predicate P = { P1, P2, …, Pn}
Example 1
As an example, for the above table Student, if simple conditions are, Marks < 77, Marks ≥ 77, then,
Set of simple predicates P1 = {Marks < 77, Marks ≥ 77}
Example 2
As another example, if simple conditions are, University = ‘California’, University= ‘Harvard’, Marks < 77, Marks ≥ 77, then,
Set of simple predicates P2 = { University = ‘California’, University= ‘Harvard’, Marks < 77, Marks ≥ 77}
Min-term Predicate
When we fragment any relation horizontally, we use single condition or set of simple predicates to filter the data. Given a relation R and set of simple predicates, we can fragment a relation horizontally as follows;
Fragment, Ri = σFi(R), 1 ≤ i ≤ n
where Fi is the set of simple predicates, also called as a Min-term predicate which can be written as follows;
Min-term predicate, Mi=P1 Λ P2 Λ P3 Λ … Λ Pn
Here, P1 means both P1 or ¬(P1), P2 means both P2 or ¬(P2), P3 means both P3 or ¬(P3), and so on. Using the conjunctive form of various simple predicates in different combination, we can derive many such min-term predicates.
For the example 1 stated previously, we can derive set of min-term predicates using the rules stated above as follows;
We will get 2n min-term predicates, where n is the number of simple predicates in the given predicate set. For P1, we have 2 simple predicates. Hence, we will get 4 (22) possible combinations of min-term predicates as follows;
m1 = {Marks < 77 Λ Marks ≥ 77}
m2 = {Marks < 77 Λ ¬(Marks ≥ 77)}
m3 = {¬(Marks < 77) Λ Marks ≥ 77}
m4 = {¬(Marks < 77) Λ ¬(Marks ≥ 77)}
Our next step is to choose the min-term predicates which can satisfy certain conditions to fragment a table and eliminate the others which are not useful. For example, the above set of min-term predicates can be applied each as a formula Fi stated in the above rule for fragment Ri as follows;
Student1 = σMarks< 77 Λ Marks ≥ 77(Student)
which can be written in equivalent SQL query as,
Student1
SELECT * FROM Student WHERE Marks < 77 AND Marks ≥ 77;
Student2 = σMarks< 77 Λ ¬(Marks ≥ 77)(Student)
which can be written in equivalent SQL query as,
Student2
SELECT * FROM Student WHERE Marks < 77 AND NOT Marks ≥ 77; where NOT Marks ≥ 77 is equivalent to Marks < 77.
Student3 = σ¬(Marks< 77) Λ Marks ≥ 77(Student)
which can be written in equivalent SQL query as,
Student3
SELECT * FROM Student WHERE NOT Marks < 77 AND Marks ≥ 77; where NOT Marks < 77 is equivalent to Marks ≥ 77.
Student4 = σ¬(Marks< 77) Λ ¬(Marks ≥ 77)(Student)
which can be written in equivalent SQL query as,
Student4
SELECT * FROM Student WHERE NOT Marks < 77 AND NOT Marks ≥ 77;
where NOT Marks < 77 is equivalent to Marks ≥ 77 and NOT Marks ≥ 77 is equivalent to Marks < 77. This is exactly same as the query for fragment Student1.
From these examples, it is very clear that the first query for fragment Student1 (min-term predicate m1) is invalid as any record in a table cannot have two values for any attribute in one record. That is, the condition (Marks < 77 Λ Marks ≥ 77) requires that the value for Marks must both be less than 77 and greater and equal to 77, which is not possible. Hence the condition violates and can be eliminated. For fragment Student2 (min-term predicate m2), the condition is (Marks<77 and Marks<77) which ultimately means Marks<77 which is correct. Likewise, fragment Student3 is valid and Student4 must be eliminated. Finally, we use the min-term predicates m2 and m3 to fragment the Student relation. The fragments can be derived as follows for Student;
SELECT * FROM Student WHERE Marks < 77;
Student2
RollNo | Marks | University |
T01 | 33 | Harvard |
T04 | 23 | California |
T07 | 15 | Stanford |
T05 | 90 | Harvard |
T06 | 90 | Harvard |
SELECT * FROM Student WHERE Marks ≥ 77;
Student3
RollNo | Marks | University |
T03 | 77 | Stanford |
T02 | 89 | California |
Correctness of Fragmentation
We horizontally fragment the table with the help of min term predicates. Now, we need to validate all the chosen fragments for their correctness. We need to verify did we miss anything? We use the following rules to ensure that we have not changed semantic information about the table which we fragment.
Completeness
If a relation R is fragmented into a set of fragments, then a record of Relation R must be found in any one or more of the fragments. This rule ensures that we have not lost any records during the process of fragmentation.
Reconstruction
After fragmenting a table, we must be able to reconstruct it back to its original form without any data loss through some relational operation. Reconstruction can be performed with the help of union operation.This rule ensures that we can construct a base table back from its fragments without losing any information. That is, we can write any queries involving the join of fragments to get the original relation back. For example;
(SELECT * FROM Student2) Union (SELECT * FROM Student3);
Disjointness – If a relation R is fragmented into a set of sub-tables R1, R2, R3, …, Rn, a record belongs to R1 is not found in any other fragment. This ensures that R1 ≠ R2.
For example, consider the Student table in Figure 1 and its fragments Student2, and Student3 created using the min-term predicates we derived.
From the tables Student2, and Student3 it is clear that the fragmentation is Complete. That is, we have not missed any records.
Just all are included into one of the sub-tables.
When we use an operation, say Union between Student2, and Student3 we will be able to get the original relation Student.
(SELECT * FROM Student2) Union (SELECT * FROM Student3);
The above query will get us Student back without loss of any information. Hence, the fragments created can be reconstructed.
Finally, if we write a query as follows, we will get a Null set as output. It ensures that the Disjointness property is satisfied.
(SELECT * FROM Student2) Intersect (SELECT * FROM Student3);
We get a null set as result for this query because there is no record common in both relations Student2 and Student3.
For the example 2, recall the set of simple predicates which was as follows;
Set of simple predicates P2 = { University = ‘California’, University= ‘Harvard’, Marks < 77, Marks ≥ 77}
We can derive the following min-term predicates;
m1 = { University = ‘California’ Λ University= ‘Harvard’ Λ Marks < 77 Λ Marks ≥ 77} m2 = { University = ‘California’ Λ University= ‘Harvard’ Λ Marks < 77 Λ ¬(Marks ≥ 77)} m3 = { University = ‘California’ Λ University= ‘Harvard’ Λ ¬(Marks < 77) Λ Marks ≥ 77} m4 = { University = ‘California’ Λ ¬(University= ‘Harvard’) Λ Marks < 77 Λ Marks ≥ 77}
mn = { ¬(University = ‘California’) Λ ¬(University= ‘Harvard’) Λ ¬(Marks < 77) Λ ¬(Marks ≥ 77)}
As in the previous example, out of 16 (24) min-term predicates, the set of min-term predicates which are not valid should be eliminated. At last, we would have the following set of valid min-term predicates.
m1 = { University = ‘California’ Λ ¬(University= ‘Harvard’) Λ ¬(Marks < 77) Λ Marks ≥ 77} m2 = { University = ‘California’ Λ ¬(University= ‘Harvard’) Λ Marks < 77 Λ ¬(Marks ≥ 77)} m3 = { ¬(University = ‘California’) Λ University= ‘Harvard’ Λ ¬(Marks < 77) Λ Marks ≥ 77} m4 = { ¬(University = ‘California’) Λ University= ‘Harvard’ Λ Marks < 77 Λ ¬(Marks ≥ 77)}
m5 = { ¬(University = ‘California’) Λ ¬(University= ‘Harvard’) Λ ¬(Marks < 77) Λ Marks ≥ 77}
m6 = { ¬(University = ‘California’) Λ ¬(University= ‘Harvard’) Λ Marks < 77 Λ ¬(Marks ≥ 77)}
The horizontal fragments using the above set of min-term predicates can be generated as follows;
Fragment 1: SELECT * FROM Student WHERE University = ‘California’ AND Marks ≥ 77;
Fragment 2: SELECT * FROM Student WHERE University = ‘California’ AND Marks < 77;
Fragment 3: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks ≥ 77;
Fragment 4: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks < 77;
The horizontal fragments using the above set of min-term predicates can be generated as follows;
Fragment 1: SELECT * FROM Student WHERE University = ‘California’ AND Marks ≥ 77;
Student1
RollNo | Marks | University |
T02 | 89 | California |
Fragment 2: SELECT * FROM Student WHERE University = ‘California’ AND Marks < 77;
Student2
RollNo | Marks | University |
T02 | 23 | California |
Fragment 3: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks ≥ 77;
Student3
RollNo | Marks | University |
T05 | 90 | Harvard |
T06 | 90 | Harvard |
Fragment 4: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks < 77;
Student4
RollNo | Marks | University |
T01 | 33 | Harvard |
In the STUDENT table, we have the third University ‘Stanford’, which was not specified in the set of simple predicates. Hence, in the fragmentation process, we must not leave the tuple with the value ‘Stanford’. That is the reason we have included the min-term predicates m5 and m6 which can be derived as follows;
Fragment 5: SELECT * FROM Student WHERE University <> ‘Harvard’ AND University <> ‘California’ AND Marks ≥ 77;
Student5
RollNo | Marks | University |
T03 | 77 | Stanford |
Fragment 6: SELECT * FROM Student WHERE University <> ‘Harvard’ AND University <> ‘California’ AND Marks < 77;
Student6
RollNo | Marks | University |
T07 | 15 | Stanford |
The correctness of fragmentation:
Completeness
The tuple of the table Student is distributed into different fragments. No records were omitted. Otherwise, by performing the union operation between all the Student table fragments Student1, Student2, Student3, and Student4, we will be able to get Student back without any information loss. Hence, the above fragmentation is Complete.
Reconstruction
As said before, by performing Union operation between all the fragments, we will be able to get the original table back. Hence, the fragmentation is correct and the reconstruction property is satisfied.
Disjointness
When we perform Intersect operation between all the above fragments, we will get null set as result, as we do not have any records in common for all the fragments. So we can say that disjointness property is satisfied.
Advantages of Fragmentation in database
- Due to unavailability of the irrelevant data, we can maintain the security and privacy of the database system.
- Efficiency of the database system is increased.
- Data is locally available, so Local query optimization techniques are sufficient for most of the database queries
Disadvantages of Fragmentation
- There is a lack of back-up copies of data in distributed area and it may cause the database to be ineffective if a site down.
- Reconstruction is expensive techniques in case of recursive fragmentation.