BenV's notes

Browser ad blocking

by on Jul.29, 2009, under Software

Not always I have to be negative about stuff…. ads for instance. They’re terribly kanker!

Take a reasonably fast site, tag in a few ads and *boom*. Your head explodes in anger when you accidentily mouseover some keyword/item that pops up a flash thingy… or maybe not flash but css mouseover crap… or whatever it is they do these days. Not only that, it takes 5 minutes to load the site  instead of 2 seconds, because the adservers are swamped and very slow.

So at some point doubleclick.net got me kinda pissed off, so I added a nice 127.0.0.1 entry in their dns record. (I love my own dns resolvers/servers). However, this still requires my browser to lookup their retarded server, and ask 127.0.0.1 to retrieve something. Obviously 127.0.0.1 doesn’t have a webserver running so it’ll get a connection refused if you’re lucky. Add in a firewall and the packet migth even get dropped and it’ll wait for a timeout, stalling and slowing your page down even more!

A better solution was needed, so finally I headed for  Squid. Needless to say I went for the bleeding edge verson, which is also why I wanted to post this, because I felt like updating it again yesterday. Anyway, building and installing it was pretty straightforward as soon as you’ve figured out how to deal with that svn fuckup called Bazaar. (yeah, I’m sure tons of people love it, for me it’s just another weirdass version management tool that doesn’t do what I want… feels like git, but more svn-like. I’ll stick to mercurial for my own stuff, thank you. After installing it on my server and scratching the default configuration to my system it simply worked. It even played nice with daemontools! (just give it a -N when starting it so it sticks in the foreground).

However, 2 issues remained. The first issue was that it didn’t block anything yet. Easily fixed, simply add a few “ads” ACLs that deny access to everything that matches ads.  Something like this:
acl ads dstdom_regex -i "/etc/squid/adservers.list"
deny_info ERR_ACCESS_DENIED ads
http_access deny ads

This will define an access control that matches everything in the adservers.list file (more on that in a bit) which will be denied when requested through the proxy by a big “ACCESS DENIED” page.

Woops, we don’t want a big access denied page for ads, so leave out the deny_info or put your own error page there if you feel like customizing your ads. Maybe a little html file that has a smiley face wherever an ad was supposed to be? The posibility are limitless! *MWHAHAHA*

“Yeah, that’s cute, but where do you get that adservers.list?”

Obviously I’m too lazy to type that adservers list myself, so some googling came up with this  nice piece of info – http://pgl.yoyo.org/adservers/#withsquid.

I have a little script in cron that fetches that adservers file every day. The script is on that page I just linked, or for the lazy people: HERE.

Goody! Now you just kick squid in the balls, svc -t /var/service/squid here, and the ads are blocked.

Be aware that this is not the only list of adservers, for instance there’s this huge thing called Shalla’s Blacklist that you can download and add as well. Simply make a new acl for the list you want to include (note that not all lists have regexps, for shalla you probably want to use ‘acl shalla_bla dstdomain “/BL/some/file/in/there”‘)  point it to the file, and kick squid in the nuts.

However, there’s still this tiny annoying thing that people manually have to enter your proxy into their browser. While it seems like an easy thing to  do, people are stupid. Especially other people. Not only that, I’m lazy, so I don’t want to do more than click the ‘enable proxy’ button in opera.  So onto the proxy autoconfiguration!

For this to work there are 2 methods, based on DNS and DHCP. Since they both suck I implemented both here. Simple steps:

  • In /etc/dhcpd.conf, add in the global part:  option wpad code 252 = text;
  • In the shared-network part if you have it, otherwise in the global part I suppose, add option wpad "http://your.local.webserver.nl/proxy.pac";

That covers the DHCP part. Make sure you have a webserver running on that url you put there. Important, ditch a .htaccess file there with something like:


ForceType application/x-ns-proxy-autoconfig


ForceType application/x-ns-proxy-autoconfig


Why? Because browsers are retarded, all of them. If they don’t see this filetype they’ll silently refuse to handle the thing. It’s the “silently” part that annoys me.
On to the DNS method:

  • Make sure your /etc/resolv.conf gets a ‘search my.domain.nl’ directive, either through DHCP or manually. Goal of this is so that you type “ping wpad” it will try to ping wpad.my.domain.nl, this will make things easier.
  • Add a DNS entry for your domain so that wpad.my.domain.nl resolves to your webserver above.

If you did what I meant, you are now able to wget http://wpad/wpad.dat and get a 404. “Great!”. Yeah, I’m sure you’re happy. You should be able to see this in the webserver logs.
Time to put a file there, make it something like this:

function FindProxyForURL(url, host)
{
return "PROXY 192.168.1.1:3128; DIRECT";
}

The above example is the most simple you can get, and assumes your squid is running on 192.168.1.1. It instructs browsers that accept this file to first try a proxy and if that fails make a direct connection.
Personally I symlinked this wpad.dat to proxy.pac so both can be fetched from the webserver, but that’s up to you.
Try it out first with a file this simple, because a syntax terror will make it fail. There is a wpad.dat tester out there somewhere, but I assume you’re not a complete and utter moron and can handle
a few lines of garbagescript. For more complex examples there are plenty of sites out there that describe this wpad.dat idiocy.

Finally! It should work now! Startup your favorite version of opera, I’m using build 4478 right now ;), and enable the proxy autoconfiguration checkbox. For the autoconfiuration URL you can use http://wpad/wpad.dat if it doesn’t work when left empty.
Run a “tail -f /var/log/squid/access.log” and you should see stuff coming in. Ads fully denied. AAHHAHAHAHA *evil laugher*.





1 Comment for this entry

1 Trackback or Pingback for this entry

Leave a Reply

You must be logged in to post a comment.