Proxy Auto Config File

Introduction

Here I will go over some information on proxy auto configuration files. I built and used one while working for a global company and it was an excellent way of saving money while at the same time enhancing our proxy infrastrcture. Bear in mind it was a while since I did this so most of it is coming from memory .... I advise you NOT to take anything here as gospel but rather to try it yourself first in a test environment.

What is it ?

Well, the official site for .pac files is here hosted by Netscape who I beleive came up with the idea for .pac files. In simple terms .pac files are a JavaScript file implementing a single function “FindProxyForURL(url, host)” which the browser calls for every web request it receives. The function is expected to return a directive informing the browser how to get the page in question. The directive is one of the following:

  • DIRECT - go direct to the server for this page
  • PROXY host:port - use the defined proxy for this page
  • SOCKS host:port - use the defined socks server for this page

That’s it. It really is simple. So simple in fact that many people have not been able to see the point. Until you point out that the return value can in fact be a semi-colon sperated list of proxies to “try” and use to get a page. And that the function can be as complex as you want, opening up all sorts of possibilites for deciding which proxy to use. Some of the things you can use to decide on routing are:

  • Protocol of the request
  • IP address of the client
  • IP address of the server
  • Client or server DNS domain
  • The content of the requested URL
  • The date, day or time the request is made

Why use one ?

The answer to this is simple. If you want to add resilience, load balancing and or complex routing of requests to your proxy infrastrcture for next to nothing then use a .pac file. I will present examples of each here. Or, if you don’t wany any of that, you can use it to simplify managing your proxy exception list if you have one. Because if you change a .pac file, all people have to do is restart their browsers to get new settings rather than logout and in again. Which makes pushing updates quicker for and more transparent to your users. It can even be used to seperate “ownership” of proxy administration from NT user administration. You don’t have to give someone access to edit people’s profiles to manage proxy configrations if you have a .pac file. Simply edit the profiles to point at the .pac file and give the proxy administrator access to that file.

Requirements

You only need two things to use a .pac file. The first is obviously a proxy server to point your clients at for certain requests. The second is a web server to serve up the .pac file. The latter requirement is not strictly true, you can specify a filesystem in most browsers such as smb://myfileserver/globalshare/proxy.pac or \\myfileserver\globalshare\proxy.pac for windows machines. I tend to prefer using a web server but that’s just my preference for interoperability and security.

Resilience

This is simple. If you have more than one proxy server, which is itself cheep as chips if you use squid on Linux with an old hardware, you just create a .pac file that dumbly returns both proxies. That way if your primary proxy fails, all clients will start using the secondary proxy. You can even return a last ditch DIRECT option, assuming that would be any use for anyone.

Load Balancing

Again, this is simple. Just create a function that return one or other of your proxies at random. Use the JavaScript math library and just be aware of how many servers you have and check your maths is right. If you want to be a little more “sensible” you can send all requests for a given page to the same proxy, while still splitting load. This is explained below in the super proxy script.

complex Request Routing

This is a simple case of knowing what you wanted routed through which proxy and then writing the logic in JavaScript. One example might be to route all internal page requests direct to the server if you have a reasonable internal network, and only route external page requests through your proxy. The isInNet(host, pattern, mask) function of .pac files is ideal for this use. If you want to get even more complex, for example if you have multiple sites, you could even use the clients IP via the myIpAddress() .pac function to route requests for alternate site pages through a proxy as well. This would allow one .pac file for an entire organisation. Though you would obviously want to account for WAN link failures and thus probably push your .pac file to local web servers.

All of the above

If you want to, as I did, provide resilience, load balancing and complex request routing you simple combine all the above. So, when load balancing, just replace your “PROXY pxy1:8080” with “PROXY pxy1:8080 ; PROXY pxy2:8080” and your “PROXY pxy2:8080: with “PROXY pxy2:8080 ; PROXY pxy1:8080” to have each proxy act as backup in case of failure, but still load balance. for complex routing, simple put the load balance returns inside the routing logic.

The BIG daddy

The best example I have found of how to load balance is the “Super Proxy” page hosted at Sharp here. This includes the excellent URLhash() and URLhash2() functions used to hash a request for load balancing. This is superior to random load balancing as it ensure a given request is always routed to the same proxy server. This optimises the caching on your proxies, as they each effectively only cache half of all requests, rather than probably caching a large percentage of all requests. Obviously if you are using ICP or the like the effect is mitigated by the proxy, but you would still save on inter proxy request traffic.

Why did I use it ?

I used a .pac file configuration to solve the following problems:

  • We did not want to cache access to our intranet servers ... no point
  • We wanted to cache and “control” access to the internet (we bandwidth limited and authenticated most sites)
  • Certain requests had to be routed through a coporate proxy as, for reasons best know to the someone I hope, we were unable to route to these web servers from out networks.
  • We wanted to send users through their “local” proxes when accessing most sites to save WAN bandwidth
  • Some proxies outside our control did not functioning properly for certain sites and so we had to send some requests through our specific proxies not the “local” ones.
  • We wanted to have resiliency such that if a “local” proxy failed another one would be used
  • We wanted to load balance in an extensible manner so as to maximise web access speeds
  • There were differences in internal and external DNS for some sites requiring routing them differently depending on what people were used to or needed access to.
  • Not all of out proxies would support FTP (someone being silly with some of our firewalls) and so we needed to route all FTP requests through specific proxies regardless of other routing considerations.

I acheived all of this with an admittedly rather complex .pac file. I also successfully tested a variety of failure scenarios to ensure “smooth operation” from the user perspective. The only problem I was unable to resolve using a .pac file was that of sending users to a different proxy dependant on the NT domain they were logged into. Unfortunately you can’t access certain information in JavaScript. This was only a problem in the first place due to messy and or un-implemented trust relationships between domains and so is unlikely to be encountered by others.

Alternatives

The only alternative I am aware of for getting resilience and or load balancing is to use some form of clustering or failover. I have seen the following all used instead of a .pac file, and NONE of them offer the same complex routing options and ALL of them cost many thousands of pounds to implement:

  • Vertias cluster server running proxy software
  • Proxy servers load balanced using hardware (in my instance Foundry ServerIrons - great for what they do, but expensive comapred to .pac files)
  • REALLY expensive Microsoft ISA server clustering and or failover (not just the cost of several ISA servers)

The only advantage any of these have over a .pac file implementation is if you have automated systems needing web access that don’t understand .pac files but are critical. For this situation an HA proxy implementation is obviously required. But I think this is a fairly rare occurance in situations in which you wouldn’t have load balancers or clustering for something else that you could use for your proxies too. In such a situation it might still be desirable to use a .pac file for client browers to get the complex routing logic and cache optimisation provided.

Bugs and Gotchas

I present here a list of the bugs and gotchas I have come acrosss whne trying to use a .pac file or do interesting things with one in the hopes it helps others doin anything similar.

  • Mac OS/X is a bit funny with global variable, at lease in Tiger. If you have ANY global variables used in your .pac file move them to either be local variables or put them inside the “top most” FindProxyForURL() function without the “var” prefix to use them as you would expect.
  • Bear in mind you can’t do everything in .pac files that you might normally do in JavaScript files. One example is the lack of object to find out browser type. Though oddly enough ceratain activeX function I have found to work if memory serves.
  • Once IE has “failed” a proxy, it tends to need a restart before it will use it again. So when a failed proxy is brought back on-line, either don’t break the working one till at least the next day, or have your helpdesk aware that they need to tell anyone with IE problems to close and re-open IE to get back to work.
  • Make absolutely sure you check the vailidty of your code by developing with a good IDE and by trying it. If you get something slightly wrong you may well end up with everything simply being requested direct. This may work on your site, but will most likely cause problem and nothing annoys users quicker than not having web access.
 
technical/automatic_proxy_configuration.txt · Last modified: 2006/02/07 11:32 by daleroberts