Commit Graph

35 Commits

Author SHA1 Message Date
Brion Vibber
c74aea589d Stomp queue restructuring for mass scalability:
- Multiplexing queues into groups and for multiple sites.
- Sharing vs breakout configurable per site and per queue via $config['queue']['breakout']
- Detect how many times a message is redelivered, discard if it's killed too many daemons
 - count configurable with $config['queue']['max_retries']
 - can dump the items to files in $config['queue']['dead_letter_dir']

Queue daemon memory & resource leak fixes:
- avoid unnecessary reconnections to memcached server (switch persistent connections back in on second initialization, assuming it's child process)
- monkey-patch for leaky .ini loads in DB_DataObject::databaseStructure() - was leaking 200k per active switch
- applied leak fixes to Status_network as well, using intermediate base Safe_DataObject for both it and Memcache_DataObject

Misc queue fixes:
- correct handling of child processes exiting due to signal termination instead of regular exit
- shutdown instead of infinite respawn loop if we're already past the soft memory limit at startup
- Added --all option for xmppdaemon... still opens one xmpp connection per site that has xmpp active

Cache updates:
- add Cache::increment() method with native support for memcached atomic increment
2010-02-16 09:16:51 -08:00
Brion Vibber
3d0c3f0577 Pull fix from testing branch: use new encoding funcs w/ stomp queues 2010-02-16 09:15:29 -08:00
Brion Vibber
d9c9b2a12f Queue daemon fixes:
* skip unnecessary unsubscribes on graceful shutdown -- takes a long time for many queues, slows down our restarts when hitting graceful mem limit
* fix control channel (was broken when we switched to support multiple queue servers)
2010-02-10 10:59:30 -08:00
Evan Prodromou
4ae31f3476 on exceptions, stomp logs the error and reenqueues 2010-01-30 13:15:17 -05:00
Brion Vibber
155a5d446f Manual failover for stomp queues.
If an array of multiple servers is put in $config['queue']['stomp_server'], enqueues will pick a random server to send to (failing over automatically if any are down).
Queue handling daemons connect all servers so they get events no matter where they were delivered.
In case of disconnection, daemons should now handle it gracefully and attempt to reconnect every 60 seconds or so, automatically resubscribing to all queues once it's back up.

Can put to 'native' failover for reads as well by disabling $config['stomp']['manual_failover'] = false; but this is untested and may explode in addition to requiring that your ActiveMQ cluster actually be set up to handle its own data distribution.

Additionally, can choose which queues to mark as persistent by setting $config['stomp']['persistent'] to an array of queue names.
2010-01-28 16:49:32 -08:00
Brion Vibber
a868a523a5 Can now set $config['queue']['stomp_persistent'] = false; to explicitly disable persistence when we queue items 2010-01-28 09:52:35 -08:00
Brion Vibber
427ac3a3a6 debug log line for control channel sub 2010-01-27 20:51:04 -08:00
Brion Vibber
c51539804a Add persistent:true property to Stomp messages so ActiveMQ doesn't decide to discard them even though persistence is enabled on the broker. :) (Thanks Aric!) 2010-01-27 09:24:59 -08:00
Brion Vibber
58be61b641 Control channel for queue daemons to request graceful shutdown, restart, or update to listen to a newly added or reconfigured site.
queuectl.php --update -s<site>
  queuectl.php --stop
  queuectl.php --restart

Default control channel is /topic/statusnet-control. For external utilities to send a site update ping direct to the queue server, connect via Stomp and send a message formatted thus:

  update:<nickname>

(Nickname here, *not* server hostname! The rest of the queues will be updated to use nicknames later.)

Note that all currently-connected queue daemons will get these notifications, including both queuedaemon.php and xmppdaemon.php. (XMPP will ignore site update requests for sites that it's not handling.)

Limitations:
* only implemented for stomp queue manager so far
* --update may not yet handle a changed server name properly
* --restart won't reload PHP code files that were already loaded at startup. Still need to stop and restart the daemons from 'outside' when updating code base.
2010-01-26 11:49:49 -08:00
Brion Vibber
99866a459b Fix for stuck queue messages: wrap processing in stomp transactions so our lack of an ACK if PHP dies actually triggers redelivery.
Previously, messages once delivered would just get stuck in the queue seemingly forever if they never got ACKed.
Note this could lead to partial duplication, for instance if the OMB or Twitter queue handlers die after 1/2 of the outgoing sends.

Recommendations:
* catch exceptions more aggressively within queue handlers (so only PHP fatal errors are likely to kill in the middle)
* for processing that involves sending to multiple clients, consider a second queue similar to the XMPP output, eg for OMB:
 - first queue gets delivery list and builds message data, enqueueing it for each target address
 - second queue can handle each individual outgoing message (and attempt redelivery etc separately)

This would also protect better against a recurring error preventing delivery in the second part, and could spread out any slow sends over multiple threads.
2010-01-22 12:35:05 -08:00
Brion Vibber
0e852def6a XMPP queued output & initial retooling of DB queue manager to support non-Notice objects.
Queue handlers for XMPP individual & firehose output now send their XML stanzas
to another output queue instead of connecting directly to the chat server. This
lets us have as many general processing threads as we need, while all actual
XMPP input and output go through a single daemon with a single connection open.

This avoids problems with multiple connected resources:
* multiple windows shown in some chat clients (psi, gajim, kopete)
* extra load on server
* incoming message delivery forwarding issues

Database changes:
* queue_item drops 'notice_id' in favor of a 'frame' blob.
  This is based on Craig Andrews' work branch to generalize queues to take any
  object, but conservatively leaving out the serialization for now.
  Table updater (preserves any existing queued items) in db/rc3to09.sql

Code changes to watch out for:
* Queue handlers should now define a handle() method instead of handle_notice()
* QueueDaemon and XmppDaemon now share common i/o (IoMaster) and respawning
  thread management (RespawningDaemon) infrastructure.
* The polling XmppConfirmManager has been dropped, as the message is queued
  directly when saving IM settings.
* Enable $config['queue']['debug_memory'] to output current memory usage at
  each run through the event loop to watch for memory leaks

To do:
* Adapt XMPP i/o to component connection mode for multi-site support.
* XMPP input can also be broken out to a queue, which would allow the actual
  notice save etc to be handled by general queue threads.
* Make sure there are no problems with simply pushing serialized Notice objects
  to queues.
* Find a way to improve interactive performance of the database-backed queue
  handler; polling is pretty painful to XMPP.
* Possibly redo the way QueueHandlers are injected into a QueueManager. The
  grouping used to split out the XMPP output queue is a bit awkward.
2010-01-21 22:40:35 -08:00
Brion Vibber
598072468c --xmpp-only hack for queuedaemon.php to run separate queue daemon with only xmpp threads 2010-01-15 11:13:06 -08:00
Brion Vibber
2f32181c93 Keep handler registration per-site to fix queue registration in mixed config environment 2010-01-14 13:22:33 -08:00
Brion Vibber
ec145b73fc Major refactoring of queue handlers to support running multiple sites in one daemon.
Key changes:
* Initialization code moved from common.php to StatusNet class;
  can now switch configurations during runtime.
* As a consequence, configuration files must now be idempotent...
  Be careful with constant, function or class definitions.
* Control structure for daemons/QueueManager/QueueHandler has been refactored;
  the run loop is now managed by IoMaster run via scripts/queuedaemon.php
  IoManager subclasses are woken to handle socket input or polling, and may
  cover multiple sites.
* Plugins can implement notice queue handlers more easily by registering a
  QueueHandler class; no more need to add a daemon.

The new QueueDaemon runs from scripts/queuedaemon.php:

* This replaces most of the old *handler.php scripts; they've been refactored
  to the bare handler classes.
* Spawns multiple child processes to spread load; defaults to CPU count on
  Linux and Mac OS X systems, or override with --threads=N
* When multithreaded, child processes are automatically respawned on failure.
* Threads gracefully shut down and restart when passing a soft memory limit
  (defaults to 90% of memory_limit), limiting damage from memory leaks.
* Support for UDP-based monitoring: http://www.gitorious.org/snqmon

Rough control flow diagram:
QueueDaemon -> IoMaster -> IoManager
                           QueueManager [listen or poll] -> QueueHandler
                           XmppManager [ping & keepalive]
                           XmppConfirmManager [poll updates]

Todo:

* Respawning features not currently available running single-threaded.
* When running single-site, configuration changes aren't picked up.
* New sites or config changes affecting queue subscriptions are not yet
  handled without a daemon restart.
* SNMP monitoring output to integrate with general tools (nagios, ganglia)
* Convert XMPP confirmation message sends to use stomp queue instead of polling
* Convert xmppdaemon.php to IoManager?
* Convert Twitter status, friends import polling daemons to IoManager
* Clean up some error reporting and failure modes
* May need to adjust queue priorities for best perf in backlog/flood cases

Detailed code history available in my daemon-work branch:
http://www.gitorious.org/~brion/statusnet/brion-fixes/commits/daemon-work
2010-01-12 20:45:09 -08:00
Evan Prodromou
ae883ceb9b change controlyourself.ca to status.net 2009-08-25 18:19:04 -04:00
Evan Prodromou
d35b2d3f3c change laconi.ca to status.net 2009-08-25 18:16:46 -04:00
Evan Prodromou
c8b8f07af1 change Laconica and Control Yourself to StatusNet in PHP files 2009-08-25 18:12:20 -04:00
Evan Prodromou
0828fde51c one more shot at servicing queues 2009-07-09 15:25:59 -04:00
Evan Prodromou
43e0b308fd Revert "Let the queue handlers drain their xmpp queues"
This reverts commit fc3442a041.
2009-07-09 13:39:22 -04:00
Evan Prodromou
fc3442a041 Let the queue handlers drain their xmpp queues 2009-07-09 13:26:09 -04:00
Evan Prodromou
031146f4c7 yet another select() refinement 2009-07-09 12:49:37 -04:00
Evan Prodromou
eccab87044 slightly more robust select() logic 2009-07-09 12:33:38 -04:00
Evan Prodromou
03200235b1 use select() to bring down xmpp latency 2009-07-09 12:09:20 -04:00
Evan Prodromou
1daad01f36 slightly better timing 2009-07-09 11:40:01 -04:00
Evan Prodromou
8aef0e4271 manually re-enqueue failed notices 2009-07-08 17:55:43 -04:00
Evan Prodromou
a626f32d8e log errors in handling notices 2009-07-08 01:36:12 -04:00
Evan Prodromou
23e6dafff6 better handling of frames and notices 2009-07-05 11:01:07 -04:00
Evan Prodromou
66a4a60e0b better debug logging in stomp queue manager 2009-07-04 01:43:18 -04:00
Evan Prodromou
f63702579a don't say we're connecting if we're not 2009-07-04 01:16:58 -04:00
Evan Prodromou
49c5c6f92b move handling code into queuemanager 2009-07-04 00:31:28 -04:00
Evan Prodromou
3e4be98ff6 add _queueName function 2009-07-03 10:05:07 -04:00
Evan Prodromou
e8f27025ba more logging in stompqueuemanager 2009-07-02 12:43:09 -04:00
Evan Prodromou
2325d934a8 add fail() method to stompqueuemanager 2009-07-01 12:10:11 -04:00
Evan Prodromou
7b66a12913 save frames for StompQueueManager 2009-07-01 11:10:23 -04:00
Evan Prodromou
e5b758dbbe start of queuemanager code 2009-06-28 14:38:31 -04:00