New User? Register here - Existing Users: Username: Password: [Advanced Login]

 

 

Current Poll

Your preferred Interactive shell?









( 1358 votes ~ 15 comments )

 

Weblog entry #2 for miguel

Blacklist project
Posted by miguel on Mon 19 Feb 2007 at 22:16
Hello Debian friends,

Last time I posted was long time ago, and Etch still not released. :-(

Well, this web site holds a lot of sys admins, I'm righting to let you know about a project of mine that now has gone public: http://www.enterpriseblacklist.org.

Quoting the site:
The EBL (Enterprise Blacklist) offers a blacklist of domains, with free distribution. It is fed by collaborators, web robots and web crawlers. it has the objective to be an efficient list of domains that certainly network administrators want the users to remain distant.

We already have more than 1.5 million domains, and started from scratch.

I want to have a lot of information, and right now I'm working on a robot to collect open proxies. Take a look!

Miguel

 

Comments on this Entry

Posted by Anonymous (76.188.xx.xx) on Tue 20 Feb 2007 at 03:25
I haven't gone to the site yet, but something came to mind right off the bat...

What, if any, methods of removal do you have? Are they automated or manual?

[ Parent | Reply to this comment ]

Posted by miguel (201.12.xx.xx) on Tue 20 Feb 2007 at 03:42
[ Send Message | View Weblogs ]
Both.
You can suggest a domain for removal, and this suggestion goes to a voting queue to.
Daily 10.000 domains are checked if they resolve to any IP. If we receive an NXDOMAIN error, the domain is removed. If the name server times out, the domain is disabled but will be tested again in 10 days.

I will try keep every domain tested at least with a 10 days interval.

This process is automated. Will can see the results of every test on this page: http://www.enterpriseblacklist.org/?q=blog/7

The test robot "blogs" every time with the results of the tests.

On the main page there is link on top named "Log", that goes to this page.

[ Parent | Reply to this comment ]

Posted by Steve (80.68.xx.xx) on Tue 20 Feb 2007 at 11:33
[ Send Message | View Steve's Scratchpad | View Weblogs ]

Here's a fun question for you...

If you found a host on the net which was running port scans looking for open proxies - would you list it?

ie. What exactly are the criterion for inclusion in your list? On your site I just see a list of categories, but no real detail. (I apologize if I've missed it.)

Steve

[ Parent | Reply to this comment ]

Posted by Anonymous (201.12.xx.xx) on Tue 20 Feb 2007 at 14:52
Good point!

Actually, my idea for open proxies is not hammer any random IP of the net. It is just make a robot that extracts the IPs from sites like http://www.proxy4free.com/page1.html or http://www.samair.ru/proxy/socks.htm. There is a lot of this sites that daily publish open proxies. It would be really more effective do this way, IMO, because let them to the hard work, we just collect them, so you could block them on your proxy or firewall.

I updated the FAQ with more information about the criterion of the blacklist.

What is the criteria for inclusion on the list?

That depends on the category or the source of the domain.

We have 2 robots that extract domains from Sedo and The Domain Name After Market, and mark them on the Parked category. Every domain listed on this sites are for sale, they have no real content, just advertisement. We have another robot that daily extracts domains listed on the RSS feed of The Domain Name After Market.

EBL has a crawler, that has the mission of finding porn sites over the web. This crawler is under development, but it s working quite good. The crawler starts with a seed, provided manually. It extracts all links of the seed and follow them. If the domain of the followed link has some pre-defined words, like gangbang, adult, teens, etc, the domain is listed on the blacklist, if not, the domain is sent to a queue. Then the domain will be accessed by Dansguardian, if blocked, then it is listed, if not it is discarted. Then another cycle starts again, but the seed is always a porn site.

You may ask your self that this crawler can do mistakes. Yes it can. But, there is a curious thing, porn sites link to porn sites, and the chance that a porn site links to a non porn site is really, really small. By the way, most of the non-porn sites found by the crawler linked from porn sites are almost the same every time for all porn sites. The porn crawler never goes beyond one pass from the seed, removing the chances do get out of the context "porn to porn" links.

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Wed 21 Feb 2007 at 18:06
[ Send Message | View Weblogs ]
The presentation of the website is good, but the written content is terrible.

"Help the fight against users" ?!

"It has the objective to be an efficient list of domains that certainly network administrators want the users to remain distant."

Machine translation in use, by any chance?

What you seem to be producing is a list of domains, registered for the purpose of advertising, with no original content.

Get a proof reader please...

[ Parent | Reply to this comment ]

Posted by miguel (201.12.xx.xx) on Wed 21 Feb 2007 at 18:21
[ Send Message | View Weblogs ]
Yeah, the content is not that good, sorry, my bad, I will try to improve that.

Right now, the biggest content is parked domains, and I want to doing this because this domains are hard to filter, and makes my web crawler waste a lot of time parsing them. There is more than 120.000 working porn sites there to. Now that I have a good sample of parked domains (over 1.8 million), my porn crawler is skipping this domains and finding the "useful" junk that I want to list.

Right now, this moment, I'm finishing the open proxy robot, and before you ask if the bot goes on wild around the net knocking IPs, NO, it doesn't.

Quoting my other post:

"Actually, my idea for open proxies is not hammer any random IP on the net. It is just make a robot that extracts the IPs from sites like http://www.proxy4free.com/page1.html or http://www.samair.ru/proxy/socks.htm. There is a lot of this sites that daily publish open proxies. It would be really more effective do this way, IMHO, because let them to the hard work, we just collect them, so you could block them on your proxy or firewall."

[ Parent | Reply to this comment ]

 

 

Flattr