[NRVR-Members] On GMail and Unwanted E-Mails
Thomas Tweeks Weeks
tom at theweeks.org
Fri Jan 25 23:53:57 CST 2019
Thanks for the info Adrien. S, pam filtering is a bit off topic from "rockets", but hey, since it pertains to NRVR email, and there seems to be a somewhat interested geek audience here...
Having worked for a fairly large internet/email hoster (and running my own mail servers) for a number of years.. I can tell you that 60-70% of all email the flows over the Inet is spam.. and that with any decently architected email system there are multiple layers of spam blocking beyond just the MTA/bayesian filters that you see in email headers.
The inner most is usually the per-in-box or domain-admin level controlled white lists, black lists and some times a per-in-box hash or token bayesian filters. This is where users often get to "report as spam" to help train their personalized token/hashing system. This is the "spit and polish" of blocking that is user defined.. and the infrastructure admins usually don't care about. :)
Next layer out (often done on intermediary MTAs) is usually where most of the system wide bayesian tokens, scoring and filtering that you see happen in the filters (e.g. smapassassin, commercial scrubbing services, etc). This is the layer that takes a big chunk out of that incoming flood of spam that made it past the outer gates.
Then lastly you usually have the big outer walls.. of graylisting, IP "sender reputation" and scoring that can use actively maintained black/white lists, custom software and/or lots of third party offerings that do some really fancy reputation, DKIM, SPF, and DMARK anti-spam systems that big companies are willing to pay a lot of money for. This is the layer that really knocks a big chunk out of the 70% of the incoming spam.
Those last two layers are the big ones that get you most bang for the buck.. but as anyone in the industry can tell you, blocking spam has become a lucrative game of chess (on both sides of the equation).
All that aside.. one of the oldest (and still a good indicator) of super each "reputation checks" is that of simply comparing the forward and reverse DNS lookups of a mail sender.. After I moved most of my personal domain's email off the NRVR server (Rackspace hosts it now, as I got tired of playing chess with email), then the NRVR mail was still going to/from the NRVR server and I got lazy and let my old server's forward and reverse IPs get out of sync. I didn't think it was a problem until you all started telling me that you had receivers who were blocking me as spam.
The fact that it was tagged as spam without any spamassassin token/grading in the headers means that it was likely being tagged at the outer gates, the very edge of google's mail systems.. which probably (like many other old mail systems out there) still care to look at matching forward/reverse matches as an early pass/fail test.
Such simple checks will often do one of two things.. Places like Yahoo just drop the SMTP session and the email goes in the bit bucket.. while places like google (who are a little smarter and don't want to lose hosting business by being too harsh) simply skip all further filtering systems and poop it directly into the end user's spam folder for them to figure out.
My hunch is that is what was happening. Though I'm probably wrong and someone who knows more about spam filtering than me (probably one of my old Racker friends) will now correct me in front of everyone.. :)
As I used to tell Rackers in my training classes..
The fastest way to discover the limits of your own knowledge is to open your mouth. :)
On Thursday, January 24, 2019 10:49am, "Adrien Drouault" <adrien.drouault at gmail.com> said:
Wrapping back to the "GMail sometimes flags NRVR list traffic as spam" conversation...
I've realized that it actually hasn't done it to me in quite a while. I've been looking at the headers of the things that do get tagged there, and discovered a few interesting things (well, interesting to the IT-geeks among us, at least).
There's nothing in the headers that says "this is spam". Actually, I have several examples with SPF, DKIM and ARC all "passing", but it's still flagged as spam.
Finding that, I did some research, and found that Google will flag things as spam that are "similar to things you've reported as spam in the past". Everything I've had from the list that got flagged, I used their "this is not spam" button, so, maybe they have some sort of Bayesian thing going on as well.
Long story short, it would definitely be interesting to see the headers on something that's from the list and gets flagged, but I think I won't be able to provide it at this point.
Quidquid latine dictum sit, altum viditur.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NRVR-Members