• 04 . 02 . 10
  • Modelling Set operations in PHP using object oriented classes to represent Union, Intersection and Difference.

  • Tags

    , , , ,

  • StumbleUpon

Set Operations in PHP

The standard set operations of Intersection, Union and Difference are well-understood and have a wide number of applications. Martin Fowler gives a neat example here, for modelling recurring events where there are multiple conditions.

“The last day of every second month” for example, can be represented by an intersection of two conditions, “last day of the month” and “every 2 months”. More generally, they can be applied to any situation where you want to do combinations of criteria using AND and OR.

To model this in PHP, we can define a number of Set Operation classes to which we can attach 2 or more elements for testing. The 3 classes, Union, Intersect and Difference will all be constructed in the same way, so constructor code can be placed in a common ancestor:

abstract class Celsus_Set_Operation_Abstract {
	protected $_setInterface = null;

	public function __construct($setInterface) {
		if (!is_string($setInterface)) {
			throw new Celsus_Exception("Interface must be a string.");
		}
		if (!interface_exists($setInterface, true)) {
			throw new Celsus_Exception("$setInterface is not a valid interface.");
		}
		$this->_setInterface = $setInterface;
	}

	/**
	 * Adds an element to the set.
	 *
	 * @param StdClass $element
	 */
	public function addElement($element) {
		if ($element instanceof $this->_setInterface || $element instanceof Celsus_Set_Operation_Abstract) {
			$this->_elements[] = $element;
		} else {
			throw new Celsus_Exception("Element must implement $this->_setInterface or Set");
		}
	}
}

In order to test the elements of the set, we need to ensure that they are all testable in the same way. We could force a particular method name, such as ‘contains’ or ‘includes’, but it turns out we don’t have to. As long as they all have the same testing function, we don’t actually care what it is called, and library code should be as transparent as possible. Therefore we construct the set operator with an interface, which all set elements must implement.

The addElement function provides a method by which we can add elements to the set. If the element doesn’t meet the required interface, it is rejected.

Next, let’s define a set of classes that can be compared for inclusion or exclusion. The only requirement is that they must all implement the same interface containing a single method, and that that method returns a boolean.

interface Celsus_Test_Set_Interface {
	public function acceptable();
}

class Celsus_Test_Set_Acceptable implements Celsus_Test_Set_Interface {
	public function acceptable() {
		return true;
	}
}

class Celsus_Test_Set_Unacceptable implements Celsus_Test_Set_Interface {
	public function acceptable() {
		return false;
	}
}

This is about as simple as you can get. The ‘Acceptable’ class represents an object for which the criteria is always passed. The ‘Unacceptable’ class represents one for which the criteria is always failed. In practise, these would be more complex. For dates, for example, you could test whether the supplied date met the “last day of the month criteria”. These simple classes will suffice for now though, and allow us to discuss the set mechanisms.

An intersection is true if every element of the set meets the criteria, and can be represented as follows:

class Celsus_Set_Operation_Intersect extends Celsus_Set_Operation_Abstract
	protected function __call($method, $arguments) {
		// First check that the method we are calling is defined in the specified interface.
		if (!method_exists($this->_setInterface, $method)) {
			throw new Celsus_Exception("$method is not defined in $this->_setInterface");
		}

		if (!$this->_elements) {
			// We don't have any elements.
			return false;
		}

		// Iterate through the elements, calling the supplied method name on each.
		// If any return false, this intersection is false.
		foreach ($this->_elements as $element) {
			if (!call_user_func_array(array($element, $method), $arguments)) {
				return false;
			}
		}
		return true;
	}

Because we don’t know what the name of the containing function is going to be, we use the magic __call method to trap any method calls. We test that method name against the interface we constructed the Intersection with, and if it exists, we can begin. If no elements have been added to the set, the intersection returns false. Finally, we iterate through each element in the set and test its comparer function. As an intersection, if any element fails its test, the intersection fails.

We can demonstrate this using our simple test cases above:

$intersection = new Celsus_Set_Operation_Intersection('Celsus_Test_Set_Interface');
$intersection->addElement(new Celsus_Test_Set_Acceptable());
$intersection->addElement(new Celsus_Test_Set_Acceptable());
var_dump($intersection->acceptable()); // True

$intersection2 = new Celsus_Set_Operation_Intersection('Celsus_Test_Set_Interface');
$intersection2->addElement(new Celsus_Test_Set_Acceptable());
$intersection2->addElement(new Celsus_Test_Set_Unacceptable());
var_dump($intersection->acceptable());  // False.

First, we construct the intersection, then we add some elements, and finally we test. Note that the method call is to ‘acceptable()’, which is the method specified in the interface used to construct the intersection.

Similar classes can be constructed for Union and Difference which work in much the same way. Set difference is a little more complex, however, as we have to define both a list of criteria to exclude, and a list of criteria to include.

But what if we want to combine these set operations, for more complex tests? As a further refinement we can allow the set operations to themselves include other set operations, which allows for complex constructions like the following:

$intersection = new Celsus_Set_Operation_Intersection('Celsus_Test_Set_Interface');
$elementA = new Celsus_Test_Set_Acceptable();
$elementB = new Celsus_Test_Set_Acceptable();
$elementC = new Celsus_Test_Set_Unacceptable();
$intersection->addElement($elementA);
$intersection->addElement($elementB);
$union = new Celsus_Set_Operation_Union('Celsus_Test_Set_Interface');
$union->addElement($intersection);
$union->addElement($elementC);
var_dump($intersection->acceptable());  // True.

Which translates to “(elementA AND elementB) OR elementC”. The flexible structure allows any combination of these, at any depth.

A full set of code and unit tests that implement all of the above is available here. If you have any questions on their use, let me know in the comments.

From here, the next step is to define a more useful set of criteria. Using dates as an example, we can arrange for readable code that quickly determines whether a date is in a given set of recurrence conditions:

$intersection = new Celsus_Set_Operation_Intersection('Celsus_Temporal_Expression_Interface');
$intersection->addElements(array($every_month, $last_day_of_the_month));
var_dump($intersection->includes('2010-01-31')); // True
var_dump($intersection->includes('2010-06-02')); // False

I’ll go through the steps on how to achieve that in another post.