A fascinating post-mortem on high profile network failures:
This post is meant as a reference point–to illustrate that, according to a wide range of accounts, partitions occur in many real-world environments. Processes, servers, NICs, switches, local and wide area networks can all fail, and the resulting economic consequences are real. Network outages can suddenly arise in systems that are stable for months at a time, during routine upgrades, or as a result of emergency maintenance. The consequences of these outages range from increased latency and temporary unavailability to inconsistency, corruption, and data loss. Split-brain is not an academic concern: it happens to all kinds of systems–sometimes for days on end. Partitions deserve serious consideration.

Contrary to the implications in the AWS EC2 documentation reserved instances do not have more power or more generous resource allocation – they are a billing mechanism to ensure that customers with large server allocations are able to bring that quantity up at will. Over the course of years it is up to 70% cheaper to use reserved instances.
CPU timeslicing only occurs on micro instances.

High Scalability has posted an excellent comparison of large data sizes, starting from the humble Byte and finishing with the incredible Domegemegrottebyte.
Thanks to Varnish, W3 Total Cache, CloudFlare and Blitz.io this blog is able to serve considerably more requests on cheap hardware than it would without heavy caching.
The Varnish v3 /etc/varnish/default.vcl used is here (unsetting superfluous cookies gives the greatest Varnish performance gain).

I received an HTTP 500 response from YouTube today after submitting a search for Shlomo (an excellent beatboxer and human voice percussionist) and received the cryptic error mesage below.
It looks like potentially sensitive debug data have been serialised and encrypted into the error page, but it’s not immediately obvious why this is fed back to the user and not logged internally. Perhaps this error:
- occurred in an isolated system component without write permission to persistent storage
- is a trigger condition for a reroute from an overloaded component (possibly an aggregation service?)
- is infrequent enough to be considered non-critical
- is an undesired, undiagnosed emergent property of their architecture
- is unexpected, consequently not routed to the correct engineer and mothballed
Whatever it is, there isn’t any direction on how to contact the “highly trained monkeys” so most users won’t bother. The error is either considered minor or unfixable.
Serving 2 billion videos a day is non-trivial, but for any site without such well-publicised architectural prowess giving users encrypted debug messages and no method to report them is considered bad form!
500 Internal Server Error
Sorry, something went wrong.
A team of highly trained monkeys has been dispatched to deal with this situation.
If you see them, show them this information:
UOWacxCYxzsYHkLrZEkPJ42v33Rm1C1wn5HkrqJyEFGNk3WfVhFWNNNBnXif PBdrNCrIESBIpTwnU8M1R3DMOew5Oz8CKKbgOM_HzNYITN09NXUAidiDBQ5a 0ZSK0S-qKr51QgiwuVU-D5Luof6lVbO37ec5N5yiOoTY_4uOSR8_aofoZNS6 NGu5e2AfuOefgFoi6oAF2szkUjebTZkRu6Sty3F1VWCdDEbKlu1nUTmwubGE oEI-kRNhKrH9pWdb1D1AXLmD1cpKBgdLyUZkPZw0x9sj1uJaPoG_MkbcRggX glS0FWsoKxmzLFOpuCw5vV1KWx2qtA5OHUhc4SFopbXbdt_2xAf6wLeX-INq h_Mpy_I3U39wUdi0gCnTlQP6lOFlVY6j9oa4-Qoo3VB5qcKQPgNIRRcYExcv OGJjBX1Y4dtku649z_HqQZ7aMPL-XnY0AGISfFhmqmlifBXyqJKccWViybTa KOeAJglse94TgnlA_ZfWbqmUJqHQ4TZnS_5AMraiI6elzvk0NHoa7-ACj5X3 U7zMbfqsz-MLHTpBD0i84lG5mNsAbbEwiKOaE5mYyvFvGjV2nBLqfooHtmLG nq6nbTAOSmA22ci3eKrsWahmyC1XA07zhihpwN8bnjU1WXMBIeImhaMgxXSb YLg9yUzRoooHR245bqBfLo6FYOyl4ip8YV8vvzy4j-gl1EuCGdN63RGr1WLi 6jyqmIgnBN0aOQ3s9q9_wgcVMqkl_xNIG1NTWRtcaWQ42vh4GKDtgnUNB7F2 VVZJaztX2K_4CWGkkcs9a29Ep10OWJyoRXqy_AiFuIK7PVnHVcDFXJIpLhg0 YvVyhSdhOMeK3fe2gaoS-ZDbEBEZlj2iPpl01PWmP-jkCKBCtXe7um1jWIhg alCsqFgJeu2kfPh2Z37n71568tNxDauE0tA5aK8fAdJSgUt2P3eSEBj1xsm0 XgOO07Xwm4NBexDKFmkx9lQq-oSBk0zCK35ELAPJ2dJkt-4VVOH54wxbXJIj iRRZq0wSJtaa65v9EQLAzHg8XZOOVe5G5lIu9VL8iciQ5IoP4RK9bBXotEkG jElPUoITJEw7o6_sh17hSd3eFpCkX982res3tFMh7vSkVGpw-o0eLSTnlcZm EKmJCiTcqGBAdmoiFJ3e79BoJH5b84wCMDgr6vzjyFZmuzywJLuuVL61Xit1 9NlxLoBSNJ7klsD6-UqFC7zJu3Tij47lv9elsgPvvhCJMKcaMtKRO9bhBmTh 9ucAoKqR24lEQARzi6gYYvDyxv4U-4il7HLflAnIUUjrbFgx4pLYiVoNFxfo 8VlC-SbQbcJpZbsDqM3NiIXKs_zoOLYPITFxfM94MYBlvxI1rvYvbQtmaWcF JE1DDs90IS9NZHMWIQ_YW9R1MHb_Zx_C4dalLPfqKrAitYydQuMhvuq4baPg oPzt0zzl04azsAtHrhnWrGNzCf8AP9uMTx-pEXcRcUO3t_wtPXypDtuK_fBp bEJAioBndGfpsD-XUO9NGdvfLJUgAELNy1m9Hz_6w7tQ-NlkAviAqPz0kYEt AAboAZSjQDWOF5RK0p6kDdRfU3zqKMi9G_swc4S7EoN-sDwsPH7Pdp8y-wyg fYqObTNry8_ldQ7pEM4AEKBvVh5QZMYijztA3S8Jco-wEKo-Lt2WG2JNQHNP wMXtBJNp0M-hkHLQRzNrRdqObWU5OUvf88rjvQ7pDtui8KH4c7VnxIAYYxnu 1rXqZ0hdStTimt93WOGrS0-xf9nHtFgZ11gJTJwM1KII0nRnXEb4nP-hqkvU 8ifoTz-zougC-5UXTP3V7k_XCgpdbsuDNpfCv3V24H-d-WXZhWBz6R_j5ZdP viV7SqnTCuqOzOqCZykb3t6Bcd30olshmfA6XYNxfTVI1EdLkihMXUArNCZV xImCJfhpd_hoYlVy1XmXO6HTbnOobn8hXxDHWEu5OQzdIX-2sipGKC49_-60 M29bb18c060TZwhebR6BFhTDZd4BiH5ariv1hcMg1Ace-ClZzNM_GmRdK4va 1mjSjPt5XzobXyoGQqNw4JpSVag2r2P3WSQKGbjNc3HC0-9Fyg3zvft223Bt rM_ACshW6YbTg4IJFKtmPnSWBECZJjIndGUpyYG8zotdHDcpl3OnceH8IKem I_NY9gB-3o9o9U6qMf7HMLgzkL2nGdyPysdUFBM7BxYFZ0pZymg7VamZ6ckj _YrVv-XzHsZdBRWZgqCMdWj75O_ufJqwAV12oRlTm99HtUW-NdmCCrVBgGIy RF8nXd3PdHwopT-mHFR39xucPHS_UPFjTOZh6vUoLcYl64ot00YVjtMcKpmG wFHc2SHv5GqVqVb4cRfUi6--3F_TwqxoyrTRhRBOp1wi87Pcl7ISc1We4wVN EsFFe4J3rSzVGwcxLaX70XjYzUoV5RIoAGGh7ZX0BewbXotWRe1pGZ57BMve liFcBPSIkrxsxDwKku6CJV4Q4SPZ9Q64xK6abfeW4p0k38HlEyxmb8JjwFOA amXJWvcR0mFqe94SXwak9RDtZ1nRNY5JzIyjdMAa4WrcLWe53JZMeHjOkKcB bspm4zTRnugkCZXdoJZohcKnlvPg20ISuRUvyFEj07dPA2Xvhtr_aUpfvOCo EiVpoA0-2mGr7OIptFIC9l3oaOiHQdjdEPldpC8i_ufVc6lD8tjYvoqr0Klh E1cQFqFU5FUuCdxRXyG165Ex0QMKDdMelAisbDyyDTXdpCRMiD0SD3zwG6ac Gtk3xuj-1lZx9Q1_OytrEj8134JqYfhonArAvzC5FY8QFkHuq9b98ALsFfOP QTutc3sMVPjIdxmGvZ1L4jVTPYkU98ZvrmDLMaS-0MEcCIj-hXUXKH2Vqlxd K3hMJTqCRpv77XWUES57gWLQnAGg9VUVYJmeKCKLfTQWEeYLz6LqXaZuf_h_ QKlnjENZqJdu_H3Hh4brRlSXFYWReslG3elt2E3HGGkaVeH2Npfkbbn3_2Iq KsmHyVoGqTCR76tCZUEliBnyCerVPc7G1PFe5uj1TMTa34mHGWIIG9eeTjCI fMK90Z7evvkURGaJvsrSL8GntLKVhoFYFhjrzxE66H-47xOWMfe8yj7-tGcA sgbDavUPm3_sfEXBOSWSmKSEYj0aEGUToncKpu_3VdGSOUD70tyf3nGYKfe_ E4xG8QTEO4EE-U6X-_1w_ffeE5xu9tS-2R-McufulvBW9IUDYWnLxzay1bRm jdnt2zr_7GI-mewrbEtixOKUNmMaq3AlEgXdJ5Di5dsEh0gdCoaGUkYCowqR FQJwfyAyNdv5P-nmlRHEYG_ZrpDozmCjRKB2serGFjpNpXSkYC5_HGEwIQpI 4i4jfX2LG9A_iVjzx_rDWSiS2Dzr-Gs2HmlaCldHIQjl4fU5NLBPktTQt0vh owpAzHQxjTDqXZdc0pwEDPNiy5qJNs_uh-x-zNJra_gA4nQueB9jBOq7PZq8 bEIrEgVv

As a new PHP application grows from a single server the database is usually the bottleneck. When the original server begins to struggle the database is moved to a dedicated machine or two and perhaps read-only slaves are added (this may require some application code changes – MySQL Proxy offers a neat alternative with read/write connection splitting). These are both relatively trivial and effective solutions until application traffic grows and the web server becomes the bottleneck.
RESTful Statelessness
HTTP conforms to the principle of REST, which require a that state (i.e. a small amount of session data) “is kept entirely on the client” (original PhD). Adopting this precept improves the fault tolerance and scalability of a a load-balanced web server architecture: as the user’s session is not stored on a single server, any server can fail with minimum impact. The user’s next request hits the load balancer and is routed to an active server, where their session continues from its last saved state. This configuration can be scaled easily using many identical machines, and it eliminates any individual web server as single points of failure.
If a PHP application has deeply embedded usage of the $_SESSION superglobal, removing state is difficult. Instead removing the dependency between a user’s session data and the single server it’s stored on achieves the same fault tolerance. By changing PHP’s session handler to write the serialised session to shared storage state is still being stored by the application, but it has the properties of statelessness so any server can handle the request. This is not true RESTful statelessness, in which the small amounts of user data such as a primary key are stored client-side in cookies or headers, but is a suitable alternative.
PHP’s default session storage mechanism uses file on the local filesystem. When the session starts, PHP locks the session file with the operating system’s flock() (file lock) call, which queues any subsequent requests until the current request has called session_write_close(), which releases the lock. This prevents race conditions between fast or concurrent requests (likely in AJAX rich applications) that could result in unpredictable behaviour. It’s also important that all of a session’s locks are dropped when a connection to the session store is closed in case the application crashes without releasing the lock. These essential behaviours are not replicated by some of the more frequently touted alternatives.
Memcached
PHP’s memcached extension makes it trivial to plug memcached into PHP’s session handler. It’s possible to run a pool of memcached servers, however it is a RAM-only cache. This means that when a server goes down its data is lost, moving the single point of failure from the web server’s filesystem to the cache server’s volatile RAM.
There is generally less RAM available than disk, increasing the chance of cache overflow (old but valid data being dropped in favour of newer data under low memory conditions). It’s possible to implement atomic increments for simple locks, although this doesn’t provide request queueing. Since version 3.0.4 (which is still beta) the memcached extension supports session locking and node mirroring, so although it’s currently not robust there is future potential.
DRBD
DRBD in dual-primary mode provides a redundant, drop-in replacement for the local filesystem. By mounting a clustered filesytem such as OCFS onto DRBD the only configuration change is updating the session storage path. Although DRBD is robust it can not currently be expanded to more than two nodes, and configuring a heartbeat to enforce failover can be complicated.
There are other flavours possible, including GlusterFS (which supports flock() natively but has significant overhead for small files) and NFS (with an inability to release the locks of crashed clients until they’ve reconnected and problematic lock recovery), but they introduce substantial complexity – something the default session handler gracefully avoids.
Sharedance
Sharedance mimics PHP’s filesystem locking on a remote server, keeping the file descriptor open until the client closes the session or the connection drops. The PHPDance library supports writing to a pair of servers (although this could easily be modified to add further redundancy) and fails-over if the first server is unreachable. Because this extension writes sequentially but reads from only a single server a lock is lost when the primary server fails, although the data is safe on the other server.
Both Sharedance and PHPDance are old projects; this is a good sign of their stability but as they are not actively maintained any newly discovered bugs are likely to be the discoverer’s responsibility. Here’s a configuration guide.
NoSQL
Many NoSQL solutions regard record locking and atomicity as overhead. MongoDB has implemented atomic writes which facilitates application-level locking, but there is no native queuing. NoSQL datastores rarely requires authentication by default, which could leave session data vulnerable to manipulation or reading if the local network is not secured.
Corey Ballou presents a partial solution that may be satisfactory for low traffic or non-critical session handling on his blog.
Hazelcast
Hazelcast is an open source DHT – a decentralised, fault-tolerant, massively scalable key-value store. It supports locking and queueing, authentication, inter-node encryption, and can speak the memcached protocol. All this would make Hazelcast the perfect distributed session store, however the memcache protocol doesn’t support the concept of locking, so the PHP memcached extension can’t be used.
Hazelcast exposes a RESTful API that doesn’t currently support locking either. Until the API is upgraded or a native PHP extension is released Hazelcast remains tantalisingly out of reach.
MySQL
Web applications already have an authenticated connection to a database in the cluster which (if required) is highly available. Almost all relational databases can perform read locking comparable to that of a filesystem, can queue requests for a locked resource and can be tuned to serve the data they store quickly. MySQL fulfils all these prerequisites.
Using a database server as a sesson store slightly increases load – concurrent requests to the locked resource result in multiple wait locks. Persistent database connections should be avoided so the server can release locks in case of an application crash.
Conclusion
Volatile storage may be suitable for session data if your application can repopulate the session in case of failure (if you don’t store the session key this would also require the user to log in again), and memcached v3 offers some promising redundancy features. Hazelcast is the most fault-tolerant and easily scalable solution, but is yet to integrate sufficiently with PHP to be viable.
For critical systems only Sharedance and MySQL effectively mimic the local filesystem’s functionality. Sharedance is almost an exact functional clone of the default storage mechanism, but the PHPDance interface has clunky redundancy (sequential writes and constant retries of the primary server, even in case of failure) and the project is not actively maintained. MySQL is able to duplicate the default session storage behaviour, offers various levels of redundancy that are abstracted from the session handler’s implementation, and is likely to be already available to the majority of web applications.
In the next post I will write a fault tolerant MySQL session handler optimised for high traffic volume.
Recent Posts
- Travis CI Chef Cookbooks
- The network is reliable
- Jeremy Ashkenas – Taking JavaScript Seriously with Backbone.js
- Signs that you’re a good programmer
- Signs that you’re a bad programmer
- How to Test Software (or: Teach Yourself to be a QA)
- Know Your Onions (and Antipatterns)
- Clean Code and Clean TDD Cheat Sheets
- The Definitive Guide to Bash Command Line History
- The analogy of print and code reviews
Archives
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- October 2010
- April 2010
- March 2010
Pages
Recent Comments



