Cooperative Web Caching Using Server-Directed Proxy Sharing: Ph.D. Dissertation Proposal
The World Wide Web suffers from scaling and reliability problems due to overloaded servers and congested routers. Caching at local proxy servers helps, but cannot satisfy more than a third to half of the requests; most requests must still be sent to the original HTTP server or to a higher level cache. This dissertation develops a protocol for cooperative distributed caching between proxy servers, known as server-directed proxy sharing (SDP). The goals of SDP are to reduce network contention, end-user response time, and denial of service incidents by distributing requests to caches on routes with more available bandwidth. SDP is designed to match characteristics of the HTTP protocol and web traffic patterns. For example, SDP takes advantage of web page structure by having the server piggyback a list of cache sites onto the page's HTML text. This allows the requestor to retrieve embedded images simultaneously from several cache sites, which decreases response time and better distributes network traffic. Server contact is further reduced by having caches lazily share directory information when they return requested objects.
Dissertation work will consist of three phases. The first task is to characterize HTTP server workloads by collecting local traces, analyzing them and comparing results to published studies. In the second phase, an analytical event-driven simulation will be used to evaluate the cache system. In the final phase a prototype version will be implemented.
The contribution of this dissertation lies both in the protocol design and in the simulation work. Previous simulation studies of Web caching systems modeled single Web servers without including network traffic, and used Web server traces to drive the simulation. Such simulations cannot compute response time or network congestion, and typically rely upon server load metrics such as bytes/sec and requests/sec to evaluate cache designs. I intend to develop a more predictive network cache simulator by 1) using analytical rather than trace workloads, and 2) adapting a network simulator to model interactions of network traffic generated by multiple Web servers and multiple cache sites. The ultimate goal of this simulation will be to determine whether we can provide realistic predictions of response time and network congestion for various caching protocols and workloads.
SDP is designed to fit into the existing Web infrastructure; it operates at the applications layer and can be built on top of existing HTTP servers and proxy servers. The advantage of this approach is that SDP works with current network layer protocols and Web software, making it easy to distribute and install a prototype across the Internet.