|
If you ever worked with threads and particular work queues you know how convenient they can be. Consider the following scenario. Actions and inputs from a web page triggers something that might take a (very) long time to execute and if executed during the browser session which besides annoying the user who has to wait for a page to load, might cause a time out and interrupt the processing. How do we solve this?
I’ll use the term job to indicate some work that needs to be execution, in practice this is an isolated PHP function which takes an unspecified time to execute.
Some possible solutions
My solution ended up being a mix of the two above. It consists of three major parts and, a bit simplified, works like this
Here is a nice block diagram that shows the interaction of the components And viola, we have asynchronous background execution of (almost) arbitrary PHP functions. There are a few problems left to solve, we’ll tackle those further down. Requirements
SQL databaseFirst, we’ll need some database support. I’ve used MySQL, but any database should work fine. One simple table is required, it’s called jobs and looks like this +-------+---------------------+------+-----+---------+----------------+ | FIELD | Type | NULL | KEY | DEFAULT | Extra | +-------+---------------------+------+-----+---------+----------------+ | jid | bigint(20) UNSIGNED | NO | PRI | NULL | AUTO_INCREMENT | | DATA | text | YES | | NULL | | +-------+---------------------+------+-----+---------+----------------+ The column jid is an arbitrary job id and the column data contains job information (more on what exactly job information is will be covered further down). You can create this table with the following SQL command CREATE TABLE jobs (jid bigint UNSIGNED AUTO_INCREMENT, DATA text, PRIMARY KEY (jid)) PHP implementation
The implementation consists of three files, we’ll focus on the class jobs first. This leads us into what exactly the data column should contain. Since the job (PHP function) will execute in a new context the worker must be able to bring in PHP files so that required functions and classes can be resolved. So, in addition to the actual job function and an opaque argument we also need to store a list of PHP files to include at execution time. The complete data structure stored in the data column looks like this. $data = array( /* Version identifier of job structure */ 'version' => 1, /* Array of PHP files to include at execution time */ 'include' => array(), /* Name of actual job function */ 'callback' => '', /* Opaque argument passed to callback function */ 'args' => null, ); A dedicated class calls jobs manages jobs and allows enqueuing and dequeuing of jobs. For clarity, only the function prototypes are shown here, the full source code can be found at the bottom of the pafe. class jobs { private $m_con; /* SQL Connector */ const jobversion = '1'; /* Job structure version */ private $m_data = array( 'version' => self::jobversion, 'include' => array(), 'callback' => '', 'args' => null, ); /* * Enqueue a job for execution * cb - Name of PHP function to execute * args - Arguments to pass to 'cb' * incs - Array of PHP files to include before execution */ public function enqueue($cb, $args, $incs); /* * Return the job identifier (integer) of * next available job or -1 if no new jobs are * available. */ public function nextJob(); /* * Dequeue job with id 'jid' for execution, * if no such job is available null is returned. * * An array with the following keys are returned * (jid, include, callback, args) */ public function dequeue($jid); /* * Attempt to launch worker process if not already running */ private function startWorker(); } The callback function ‘cb’ will be called like this, cb(args), before this call all files listed in the array incs will be included using require_once. /* Should be set to the path of the CLI PHP binary */ define("PHP_PATH", "/usr/local/bin/php"); private function startWorker() { /* * Attempt to read the preference * 'worker_pid' to check if the worker is * already running. */ $prefs = new Prefs($this->m_con, 'jobs'); $pid = $prefs->worker_pid; if ($pid != null) return; /* Get our working directory */ $cwd = getcwd(); /* * Construct a command such as the following * /usr/local/bin/php /working/path/worker.php */ $cmd = escapeshellcmd(PHP_PATH ." $cwd/worker.php"); $desc = array(); /* Execute the command and wait for it to finish */ $proc = proc_open($cmd, $desc, $pipes, NULL, NULL); proc_close($proc); } The startWorker function is executed in the context of a web page (from inside the Apache PHP module for example) and as you can see it doesn’t call pcntl_fork directly. Technically it still forks, otherwise it wouldn’t be able to execute another process. But we let the PHP module deal with that mess. This leads us to worker.php require_once('jobs.inc'); require_once('prefs.inc'); require_once('db_mysql.inc'); define("PHP_PATH", "/usr/local/bin/php"); if (!isset($argv)) { exit("Don't call me from a browser"); } /* * Fork our self and let the parent return directly to startWorker */ $pid = pcntl_fork(); if ($pid == -1) { die("could not fork"); } else if ($pid) { /* * This is the parent process. Record the child pid so that we * don't launch more processes than wanted. */ $con = new MySQLConnector('localhost', 'user', 'pwd', 'db'); $prefs = new Prefs($con, 'jobs'); $prefs->worker_pid = $pid; $prefs->Flush(); /* * Exit the script, it runs in a separate * PHP process NOT inside Apache */ exit; } $con = new MySQLConnector('localhost', 'user', 'pwd', 'db') $jobs = new jobs($con); $cwd = getcwd(); if ($cwd == false) exit("can't get working directory"); /* * Loop over all available jobs */ $pjid = -1; for (;;) { $jid = $jobs->nextJob(); if ($jid == -1 || $jid == $pjid) break; /* * Execute each job in a clean environment */ $cmd = escapeshellcmd(PHP_PATH . " $cwd/job_execute.php $jid"); $desc = array(); $proc = proc_open($cmd, $desc, $pipes); proc_close($proc); $pjid = $jid; } /* Tell the world that we aren't executing any more */ $prefs = new Prefs($con, 'jobs'); $prefs->worker_pid = null; The first thing that happens is that worker.php forks itself and lets the parent return directly to the caller, which in this case is startWorker. This allows startWorker and thus enqueue to finish and the calling PHP script can resume with what ever it was doing (creating a web page etc). Also note that worker.php doesn’t execute the job functions directly, instead it hands the jid number to a second script called job_execute.php. There is a good reason for this, since we must include PHP files (with require_once) to be able to execute the job function the namespace of the worker would be contaminated quite fast and with that comes the risk of name collisions. By letting each job execute in a totally clean environment they can include only the files needed for their execution and thus avoid any name collisions. job_execute.php takes a job id (jid) as argument, dequeues the job, includes all required files, calls the job function and then terminates. File: job_execute.php require_once('jobs.inc'); require_once('db_mysql.inc'); if (!isset($argv)) { exit("Don't call me from a browser"); } if (!isset($argv[1])) { exit("Job missing"); } $jid = $argv[1]; $con = new MySQLConnector('localhost', 'user', 'pwd', 'db'); $jobs = new jobs($con); $job = $jobs->dequeue($jid); if ($job == null) exit("Job dequeue failed"); /* * Include required files */ foreach ($job['include'] as $inc) { require_once($inc); } /* Execute the job handler */ $cb = $job['callback']; $cb($job['args']); Okay, that should be all. Are you still with me? Using itAfter all this…how do one use this then? First of all we need one or more job functions that we would like to be background executed. The prototype for such a function looks like this function myBgFunc($args) { ... } Where args can be anything you like (as long as PHP allows it). Put the function inside a file and include (preferably with require_once) any other files that you might need. For example like this, not the best example, but anyway. File: myfunc.php <?php /* $args is an integer in this case */ function test1($args) { $j = 1; for ($i = 0; $i < $args; $i++) { $j *= 1.00001; } /* Report result somewhere */ } ?> Now, from your run-of-the-mill PHP script, create a jobs object and enqueue the function test1. require_once('db_mysql.inc'); require_once('jobs.inc'); $con = new MySQLConnector('localhost', 'user', 'pwd', 'db'); $jobs = new jobs($con); /* Remember, "function", "argument", "files needed" */ $jobs->enqueue("test1", rand() * 1000, array('myfunc.php')); print "Job enqueued"; Even if test1 takes ages to execute, the enqeue function will execute in a fraction of a second and report back directly to the web browser. You can enqueue any number of functions, they will be executed on a first-come first-serve basis. ImprovementsI’ve tested this a bit and it seems to work, however I haven’t put it through some real situation test yet. With some additional locking it should be possible to run multiple instances of worker.php to take advantage of SMP systems. There is no built-in mechanism to know when a job has finished. The job function will need to report its results to a SQL database for example. If browser feedback is needed an AJAX solution could poll the database for the execution status. FilesFiles with complete source code If you found this useful consider putting a link back to this article from your own blog/website, thanks.
6 Responses to “Asynchronous background execution with PHP”
Leave a Reply
|

Entries (RSS)
That’s “exactly” just what I was looking for; many thanks !
Nice piece of code. What about timeout of the php ?
I believe if PHP is running as a process there is no need to set the timeout
Hey, i would like to know if you’ve made any further experiences with the pcntl_fork function and the different apache processes, which are started.. i dont maintain the server myself and i would like to avoid trouble with my hoster
Pascal, as 10 years doing hosting of some form… If it’s shared, ask them but generally i doubt they’ll be too keen. if it’s dedicated/virtual/managed/unmanaged. It really depends on your job size and how much you pay/they may oversell.
Thanks for the post Fredrik, i’m going to look further at using your example and develop into a distributed server monitoring tool i’ve started (which requires scheduling with potential long processing time), I’m much more capable with PHP than C.
[...] interesting snippet for !php developers: http://www.shapeshifter.se/2008/08/04/asynchronous-background-execution-with-php/ [...]