In the past, most anti-spam programs simply employed a catalogue of words to identify spams. Although a good set of keywords have catched many spam emails, utilizing this strategy alone has some disadvantages. Spam filters can be bypassed easily by changing the message a bit. Further, it could accidentally filter legitimate emails.
Nowadays, bayesian filtering approach is used to spot spams. The principle of this approach is most events are dependant. This means the past events can be used to value the current events. If some pieces of text have appeared in spam emails but not in legitimate emails, then it is more likely that the text was sent by a spammer.
To filter emails using bayesian technology, you need to use a database that consists of words picked up from spam and legitimate emails. Then further calculation is performed to find out how many times particular words exist in spam versus legitimate messages. When a new mail arrives, it is broken into words and the most significant words are singled out. From these words, the Bayesian filter determines the probability of the message whether it is a spam or not. If the probability is bigger than a threshold, say 0.9, the message will be classified as spam.
Using an outmoded spam filtering software, a business could get lots of false positives. Words which are commonly used by spammers such as ‘mortgage’ can make legitimate messages sent to a financial company that contain the same words will be mistakenly marked as spam. With bayesian filter, the software will not make a conclusion based on a single word only. Software that implement bayesian filter technology will adept to a particular company’s habits.
Bayesian filters cannot perfectly eliminate incoming spam emails. However it will make spammers more difficult to reach their targets’ inboxes since they have to know both words that normally appear in spam emails and also the ones that normally appear in valid emails. These will be unique from person to person.
- How to reduce the amount of spam in your mailbox by 95% or more.
- In case you provider does not provide adequate spam filtering.
Tags: bayesian filter, false positives