Horizontal Fragmentation, Types and Min term predicates

In this tutorial, we will try to learn the followings;

Horizontal Fragmentation
Types of Horizontal Fragmentation
Primary Horizontal Fragmentation
Min term predicates

In database horizontal fragmentation, a relation (table) is divided into multiple sub-tables/sub-sets horizontally and by using the simple conditions. For example, we have a relation (table) as Student(RollNo, Marks, University). Suppose the values for University attribute are ‘Stanford’, ‘California’, and ‘Harvard’, then the following SQL query can fragment the records (tuples) satisfying a simple condition. SELECT * FROM Student WHERE University = ‘California’; This query displays all the records related to the ‘California’ University, without any changes in the schema of the table. We can get three records if we change the University value in the WHERE clause of the above query, one for ‘California’, one for ‘Stanford’, and one for ‘Harvard’. So this kind of horizontally fragmenting the whole table into multiple sub-tables without changing the table structure is called Horizontal Fragmentation. The concept is usually used to keep tuples (records) at the places where they are used the most, to minimize data transfer between far locations.

Types of Horizontal Fragmentation

Primary Horizontal Fragmentation (PHF)
Derived Horizontal Fragmentation (DHF)

Example of horizontal fragmentation of data for distributed database

Primary Horizontal Fragmentation (PHF)

Primary Horizontal Fragmentation is a table fragmentation technique in which we fragment a single table and this fragmentation is row-wise and using a set of simple conditions. Note: Conditions are also called predicates. Simple predicate Given a table/relation R with set of attributes [A₁, A₂, A3, A4, …, A_n], a simple predicate P_i can be expressed as follows; P_i : A_j θ Value Where θ can be any of the symbols in the set {≤, ≥, ≠, <, >, =}, a value can be any value stored in the table for the attributed A _i. For example, consider the following table Student given in Figure 1;

RollNo	Marks	University
T01	33	Harvard
T03	77	Stanford
T04	23	California
T02	89	California
T05	90	Harvard
T06	90	Harvard
T07	15	Stanford

Figure 1: Student table For the above table, we could define any simple predicates like University = ‘California’, University= ‘Harvard’, Marks < 77 etc using the above expression “Aj θ Value”.

Set of simple predicates

Set of simple predicates is set of all conditions collectively required to fragment a relation into subsets. For a table R, set of simple predicate can be defined as; Predicate P = { P₁, P₂, …, P_n} Example 1 As an example, for the above table Student, if simple conditions are, Marks < 77, Marks ≥ 77, then, Set of simple predicates P1 = {Marks < 77, Marks ≥ 77} Example 2 As another example, if simple conditions are, University = ‘California’, University= ‘Harvard’, Marks < 77, Marks ≥ 77, then, Set of simple predicates P2 = { University = ‘California’, University= ‘Harvard’, Marks < 77, Marks ≥ 77}

Min-term Predicate

When we fragment any relation horizontally, we use single condition or set of simple predicates to filter the data. Given a relation R and set of simple predicates, we can fragment a relation horizontally as follows; Fragment, R_i = σ_Fi(R), 1 ≤ i ≤ n where F_i is the set of simple predicates, also called as a Min-term predicate which can be written as follows; Min-term predicate, M_i=P₁ Λ P₂ Λ P₃ Λ … Λ P_n Here, P₁ means both P₁ or ¬(P₁), P₂ means both P₂ or ¬(P₂), P₃ means both P₃ or ¬(P₃), and so on. Using the conjunctive form of various simple predicates in different combination, we can derive many such min-term predicates. For the example 1 stated previously, we can derive set of min-term predicates using the rules stated above as follows; We will get 2ⁿ min-term predicates, where n is the number of simple predicates in the given predicate set. For P1, we have 2 simple predicates. Hence, we will get 4 (2²) possible combinations of min-term predicates as follows; m₁ = {Marks < 77 Λ Marks ≥ 77} m₂ = {Marks < 77 Λ ¬(Marks ≥ 77)} m₃ = {¬(Marks < 77) Λ Marks ≥ 77} m₄ = {¬(Marks < 77) Λ ¬(Marks ≥ 77)} Our next step is to choose the min-term predicates which can satisfy certain conditions to fragment a table and eliminate the others which are not useful. For example, the above set of min-term predicates can be applied each as a formula Fi stated in the above rule for fragment Ri as follows; ^Student1 ⁼^σMarks< 77 Λ Marks ≥ 77^(Student) which can be written in equivalent SQL query as, Student₁ SELECT * FROM Student WHERE Marks < 77 AND Marks ≥ 77; ^Student2 ⁼^σMarks< 77 Λ ¬(Marks ≥ 77)^(Student) which can be written in equivalent SQL query as, Student₂ SELECT * FROM Student WHERE Marks < 77 AND NOT Marks ≥ 77; where NOT Marks ≥ 77 is equivalent to Marks < 77. ^Student3 ^{= σ}¬(Marks< 77) Λ Marks ≥ 77^(Student) which can be written in equivalent SQL query as, Student₃ SELECT * FROM Student WHERE NOT Marks < 77 AND Marks ≥ 77; where NOT Marks < 77 is equivalent to Marks ≥ 77. ^Student4 ^{= σ}¬(Marks< 77) Λ ¬(Marks ≥ 77)^(Student) which can be written in equivalent SQL query as, Student₄ SELECT * FROM Student WHERE NOT Marks < 77 AND NOT Marks ≥ 77; where NOT Marks < 77 is equivalent to Marks ≥ 77 and NOT Marks ≥ 77 is equivalent to Marks < 77. This is exactly same as the query for fragment Student₁. From these examples, it is very clear that the first query for fragment Student₁ (min-term predicate m₁) is invalid as any record in a table cannot have two values for any attribute in one record. That is, the condition (Marks < 77 Λ Marks ≥ 77) requires that the value for Marks must both be less than 77 and greater and equal to 77, which is not possible. Hence the condition violates and can be eliminated. For fragment Student₂ (min-term predicate m₂), the condition is (Marks<77 and Marks<77) which ultimately means Marks<77 which is correct. Likewise, fragment Student₃ is valid and Student₄ must be eliminated. Finally, we use the min-term predicates m2 and m3 to fragment the Student relation. The fragments can be derived as follows for Student; SELECT * FROM Student WHERE Marks < 77; Student₂

RollNo	Marks	University
T01	33	Harvard
T04	23	California
T07	15	Stanford
T05	90	Harvard
T06	90	Harvard

SELECT * FROM Student WHERE Marks ≥ 77; Student₃

RollNo	Marks	University
T03	77	Stanford
T02	89	California

Correctness of Fragmentation

We horizontally fragment the table with the help of min term predicates. Now, we need to validate all the chosen fragments for their correctness. We need to verify did we miss anything? We use the following rules to ensure that we have not changed semantic information about the table which we fragment.

Completeness

If a relation R is fragmented into a set of fragments, then a record of Relation R must be found in any one or more of the fragments. This rule ensures that we have not lost any records during the process of fragmentation.

Reconstruction

After fragmenting a table, we must be able to reconstruct it back to its original form without any data loss through some relational operation. Reconstruction can be performed with the help of union operation.This rule ensures that we can construct a base table back from its fragments without losing any information. That is, we can write any queries involving the join of fragments to get the original relation back. For example; (SELECT * FROM Student2) Union (SELECT * FROM Student3); Disjointness – If a relation R is fragmented into a set of sub-tables R₁, R₂, R3, …, R_n, a record belongs to R₁ is not found in any other fragment. This ensures that R₁ ≠ R₂. For example, consider the Student table in Figure 1 and its fragments Student₂, and Student₃ created using the min-term predicates we derived. From the tables Student₂, and Student₃ it is clear that the fragmentation is Complete. That is, we have not missed any records. Just all are included into one of the sub-tables. When we use an operation, say Union between Student₂, and Student₃ we will be able to get the original relation Student. (SELECT * FROM Student2) Union (SELECT * FROM Student3); The above query will get us Student back without loss of any information. Hence, the fragments created can be reconstructed. Finally, if we write a query as follows, we will get a Null set as output. It ensures that the Disjointness property is satisfied. (SELECT * FROM Student2) Intersect (SELECT * FROM Student3); We get a null set as result for this query because there is no record common in both relations Student₂ and Student₃. For the example 2, recall the set of simple predicates which was as follows; Set of simple predicates P2 = { University = ‘California’, University= ‘Harvard’, Marks < 77, Marks ≥ 77} We can derive the following min-term predicates; m₁ = { University = ‘California’ Λ University= ‘Harvard’ Λ Marks < 77 Λ Marks ≥ 77} m₂ = { University = ‘California’ Λ University= ‘Harvard’ Λ Marks < 77 Λ ¬(Marks ≥ 77)} m₃ = { University = ‘California’ Λ University= ‘Harvard’ Λ ¬(Marks < 77) Λ Marks ≥ 77} m₄ = { University = ‘California’ Λ ¬(University= ‘Harvard’) Λ Marks < 77 Λ Marks ≥ 77} m_n = { ¬(University = ‘California’) Λ ¬(University= ‘Harvard’) Λ ¬(Marks < 77) Λ ¬(Marks ≥ 77)} As in the previous example, out of 16 (2⁴) min-term predicates, the set of min-term predicates which are not valid should be eliminated. At last, we would have the following set of valid min-term predicates. m₁ = { University = ‘California’ Λ ¬(University= ‘Harvard’) Λ ¬(Marks < 77) Λ Marks ≥ 77} m₂ = { University = ‘California’ Λ ¬(University= ‘Harvard’) Λ Marks < 77 Λ ¬(Marks ≥ 77)} m₃ = { ¬(University = ‘California’) Λ University= ‘Harvard’ Λ ¬(Marks < 77) Λ Marks ≥ 77} m₄ = { ¬(University = ‘California’) Λ University= ‘Harvard’ Λ Marks < 77 Λ ¬(Marks ≥ 77)} m₅ = { ¬(University = ‘California’) Λ ¬(University= ‘Harvard’) Λ ¬(Marks < 77) Λ Marks ≥ 77} m₆ = { ¬(University = ‘California’) Λ ¬(University= ‘Harvard’) Λ Marks < 77 Λ ¬(Marks ≥ 77)} The horizontal fragments using the above set of min-term predicates can be generated as follows; Fragment 1: SELECT * FROM Student WHERE University = ‘California’ AND Marks ≥ 77; Fragment 2: SELECT * FROM Student WHERE University = ‘California’ AND Marks < 77; Fragment 3: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks ≥ 77; Fragment 4: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks < 77; The horizontal fragments using the above set of min-term predicates can be generated as follows; Fragment 1: SELECT * FROM Student WHERE University = ‘California’ AND Marks ≥ 77; Student₁

RollNo	Marks	University
T02	89	California

Fragment 2: SELECT * FROM Student WHERE University = ‘California’ AND Marks < 77; Student₂

RollNo	Marks	University
T02	23	California

Fragment 3: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks ≥ 77; Student₃

RollNo	Marks	University
T05	90	Harvard
T06	90	Harvard

Fragment 4: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks < 77; Student₄

RollNo	Marks	University
T01	33	Harvard

In the STUDENT table, we have the third University ‘Stanford’, which was not specified in the set of simple predicates. Hence, in the fragmentation process, we must not leave the tuple with the value ‘Stanford’. That is the reason we have included the min-term predicates m₅ and m₆ which can be derived as follows; Fragment 5: SELECT * FROM Student WHERE University <> ‘Harvard’ AND University <> ‘California’ AND Marks ≥ 77; Student₅

RollNo	Marks	University
T03	77	Stanford

Fragment 6: SELECT * FROM Student WHERE University <> ‘Harvard’ AND University <> ‘California’ AND Marks < 77; Student₆

RollNo	Marks	University
T07	15	Stanford

The correctness of fragmentation:

Completeness

The tuple of the table Student is distributed into different fragments. No records were omitted. Otherwise, by performing the union operation between all the Student table fragments Student₁, Student₂, Student₃, and Student₄, we will be able to get Student back without any information loss. Hence, the above fragmentation is Complete. Reconstruction As said before, by performing Union operation between all the fragments, we will be able to get the original table back. Hence, the fragmentation is correct and the reconstruction property is satisfied.

Disjointness

When we perform Intersect operation between all the above fragments, we will get null set as result, as we do not have any records in common for all the fragments. So we can say that disjointness property is satisfied.

Advantages of Fragmentation in database

Due to unavailability of the irrelevant data, we can maintain the security and privacy of the database system.
Efficiency of the database system is increased.
Data is locally available, so Local query optimization techniques are sufficient for most of the database queries

Disadvantages of Fragmentation

There is a lack of back-up copies of data in distributed area and it may cause the database to be ineffective if a site down.
Reconstruction is expensive techniques in case of recursive fragmentation.

Horizontal Fragmentation, min term predicates, primary horizontal fragmentation, distributed Database
By: Prof. Dr. Fazal Rehman | Last updated: March 3, 2022

Horizontal Fragmentation, Types and Min term predicates

Types of Horizontal Fragmentation

Example of horizontal fragmentation of data for distributed database

Primary Horizontal Fragmentation (PHF)

Set of simple predicates

Min-term Predicate

Correctness of Fragmentation

Completeness

Reconstruction

The correctness of fragmentation:

Completeness

Disjointness

Advantages of Fragmentation in database

Disadvantages of Fragmentation

Related Posts:

Leave a Comment