Here I will go over some information on proxy auto configuration files. I built and used one while working for a global company and it was an excellent way of saving money while at the same time enhancing our proxy infrastrcture. Bear in mind it was a while since I did this so most of it is coming from memory .... I advise you NOT to take anything here as gospel but rather to try it yourself first in a test environment.
Well, the official site for .pac files is here hosted by Netscape who I beleive came up with the idea for .pac files. In simple terms .pac files are a JavaScript file implementing a single function “FindProxyForURL(url, host)” which the browser calls for every web request it receives. The function is expected to return a directive informing the browser how to get the page in question. The directive is one of the following:
That’s it. It really is simple. So simple in fact that many people have not been able to see the point. Until you point out that the return value can in fact be a semi-colon sperated list of proxies to “try” and use to get a page. And that the function can be as complex as you want, opening up all sorts of possibilites for deciding which proxy to use. Some of the things you can use to decide on routing are:
The answer to this is simple. If you want to add resilience, load balancing and or complex routing of requests to your proxy infrastrcture for next to nothing then use a .pac file. I will present examples of each here. Or, if you don’t wany any of that, you can use it to simplify managing your proxy exception list if you have one. Because if you change a .pac file, all people have to do is restart their browsers to get new settings rather than logout and in again. Which makes pushing updates quicker for and more transparent to your users. It can even be used to seperate “ownership” of proxy administration from NT user administration. You don’t have to give someone access to edit people’s profiles to manage proxy configrations if you have a .pac file. Simply edit the profiles to point at the .pac file and give the proxy administrator access to that file.
You only need two things to use a .pac file. The first is obviously a proxy server to point your clients at for certain requests. The second is a web server to serve up the .pac file. The latter requirement is not strictly true, you can specify a filesystem in most browsers such as smb://myfileserver/globalshare/proxy.pac or \\myfileserver\globalshare\proxy.pac for windows machines. I tend to prefer using a web server but that’s just my preference for interoperability and security.
This is simple. If you have more than one proxy server, which is itself cheep as chips if you use squid on Linux with an old hardware, you just create a .pac file that dumbly returns both proxies. That way if your primary proxy fails, all clients will start using the secondary proxy. You can even return a last ditch DIRECT option, assuming that would be any use for anyone.
Again, this is simple. Just create a function that return one or other of your proxies at random. Use the JavaScript math library and just be aware of how many servers you have and check your maths is right. If you want to be a little more “sensible” you can send all requests for a given page to the same proxy, while still splitting load. This is explained below in the super proxy script.
This is a simple case of knowing what you wanted routed through which proxy and then writing the logic in JavaScript. One example might be to route all internal page requests direct to the server if you have a reasonable internal network, and only route external page requests through your proxy. The isInNet(host, pattern, mask) function of .pac files is ideal for this use. If you want to get even more complex, for example if you have multiple sites, you could even use the clients IP via the myIpAddress() .pac function to route requests for alternate site pages through a proxy as well. This would allow one .pac file for an entire organisation. Though you would obviously want to account for WAN link failures and thus probably push your .pac file to local web servers.
If you want to, as I did, provide resilience, load balancing and complex request routing you simple combine all the above. So, when load balancing, just replace your “PROXY pxy1:8080” with “PROXY pxy1:8080 ; PROXY pxy2:8080” and your “PROXY pxy2:8080: with “PROXY pxy2:8080 ; PROXY pxy1:8080” to have each proxy act as backup in case of failure, but still load balance. for complex routing, simple put the load balance returns inside the routing logic.
The best example I have found of how to load balance is the “Super Proxy” page hosted at Sharp here. This includes the excellent URLhash() and URLhash2() functions used to hash a request for load balancing. This is superior to random load balancing as it ensure a given request is always routed to the same proxy server. This optimises the caching on your proxies, as they each effectively only cache half of all requests, rather than probably caching a large percentage of all requests. Obviously if you are using ICP or the like the effect is mitigated by the proxy, but you would still save on inter proxy request traffic.
I used a .pac file configuration to solve the following problems:
I acheived all of this with an admittedly rather complex .pac file. I also successfully tested a variety of failure scenarios to ensure “smooth operation” from the user perspective. The only problem I was unable to resolve using a .pac file was that of sending users to a different proxy dependant on the NT domain they were logged into. Unfortunately you can’t access certain information in JavaScript. This was only a problem in the first place due to messy and or un-implemented trust relationships between domains and so is unlikely to be encountered by others.
The only alternative I am aware of for getting resilience and or load balancing is to use some form of clustering or failover. I have seen the following all used instead of a .pac file, and NONE of them offer the same complex routing options and ALL of them cost many thousands of pounds to implement:
The only advantage any of these have over a .pac file implementation is if you have automated systems needing web access that don’t understand .pac files but are critical. For this situation an HA proxy implementation is obviously required. But I think this is a fairly rare occurance in situations in which you wouldn’t have load balancers or clustering for something else that you could use for your proxies too. In such a situation it might still be desirable to use a .pac file for client browers to get the complex routing logic and cache optimisation provided.
I present here a list of the bugs and gotchas I have come acrosss whne trying to use a .pac file or do interesting things with one in the hopes it helps others doin anything similar.