Document Management - Detecting "problematic phrases" in user documents (Google) - Patent Application - PRIOR ART REQUEST


AN OVERBROAD PATENT ON detecting "problematic phrases" in a document - This application from Google seeks to patent the idea of...Alerting a user when a phrase in a document matches a "problematic phrase" in a database (i.e. phrases with legal or policy issues for the company)! 10 minutes of your time can help narrow US patent applications before they become patents. Follow @askpatents on twitter to help.

QUESTION - Have you seen anything that was published before 8/30/2011 that discusses:

  • Searching for problematic phrases in a document and alerting the user or a third party

If so, please submit evidence of prior art as an answer to this question. We welcome multiple answers from the same individual.

EXTRA CREDIT - A reference to anything that meets all of the criteria to the question above AND ALSO involves sending an alert to a third party if there is a match

TITLE: Detecting problematic phrases

Summary: [Translated from Legalese into English] A method for detecting "problematic phrases" in documents and alerting the user when he has entered a problematic phrase. A problematic phrase is one which has legal implications or is a policy violation for the company

  • Publication Number: US 20130110748 A1
  • Application Number: US 13/599,731
  • Assignee: Google
  • Prior Art Date: Seeking prior Art predating 8/30/2011
  • Open for Challenge at USPTO: Open through 10/29/2013

Claim 1 requires each and every step below:

A method of identifying problematic phrases in an electronic document, comprising:

  1. detecting a context of the electronic document;

  2. capturing a textual phrase entered by a user;

  3. comparing the textual phrase against a database of phrases previously identified as having legal implications or violating policy; and

  4. alerting the user via an in-line notification when the textual phrase matches a phrase in the database having legal implications or violating policy, based on the detected context of the electronic document.

In English this means:

A method of identifying problematic phrases in a document, comprising:

  1. Detecting a context of the document (e.g. Word file, Spreadsheet file, intended recipient of the document, etc.)

  2. Capturing a text phrase entered by the user

  3. Comparing the text phrase against a database of problematic phrases (e.g. with legal implications or policy violations)

  4. Alerting the user in-line when he has entered a problematic phrase in a document, based on the context

Good prior art would be evidence of a system that did each and every one of these steps prior to 8/30/2011

You're probably aware of ten pieces of art that meet this criteria already... separately, the applicant is claiming sending an alert to a third party if the user enters a problematic phrase

"Project ABC is going to totally KILL the company" from the Applicant

What is good prior art? Please see our FAQ.

Want to help? Please vote or comment on submissions below. We welcome you to post your own request for prior art on other questionable US Patent Applications.

Micah Siegel

Posted 2013-08-16T23:34:47.617

Reputation: 3 085



This is article:

Smokey: Automatic Recognition of Hostile Messages a paper presented at AAAI conference in 1997

by Ellen Spertus, a Microsoft/MIT AI Lab researcher

The subject matter is detecting "flame" emails.

ABSTRACT Abusive messages (flames) can be both a source of frustration and a waste of time for Internet users. This paper describes some approaches to flame recognition, including a prototype system, Smokey. Smokey builds a 47-element feature vector based on the syntax and semantics of each sentence, combining the vectors for the sentences within each message. A training set of 720 messages was used by Quinlan’s C4.5 decision-tree generator to determine featurebased rules that were able to correctly categorize 64% of the flames and 98% of the non-flames in a separate test set of 460 messages. Additional techniques for greater accuracy and user customization are also discussed.

George White

Posted 2013-08-16T23:34:47.617

Reputation: 21 648

1This also sounds very like a feature in some versions of the Eudora e-mail client. It would look for obscene and inflammatory words and phrases in messages (whether sent or received) and flag them with one to three chilli peppers depending on severity. It would also display a confirmation warning before allowing the sending of such a message. I recall helping my mother to set up a version of Eudora containing this feature in the early 2000s, although I can't be very specific about the date. – Chromatix – 2013-08-17T08:38:39.593


My 2006 thesis: "Knowledge discovery in corporate email : the compliance bot meets Enron" ( has an abstract which begins: "I propose the creation of a real-time compliance "bot" - software to momentarily pause each employee's email at the moment of sending and to electronically assess whether that email is likely to create liability or unanticipated expense for the corporation." It has a chapter on the kinds of things to look for - criminal, civil malfeasance, as well as other liability and risk issues and a final chapter which includes a graphic of the sort of warning the bot could display to a user. Is that what you're seeking?

Addendum - I should also have mentioned that the thesis describes the methods for pre-processing/cleansing data and describes and demonstrates how keyword and concept searching could find inappropriate phrases.

Second Addendum - In re-reading the request, I should also note that Chapter 9 of the thesis describes informing the user of finding inappropriate content and identifying the appropriate third party (e.g., compliance, human resources) that will be informed. It shows a sample pop-up window of such notification.

K. Krasnow Waterman

Posted 2013-08-16T23:34:47.617

Reputation: 21


Most keyloggers, monitoring software, software to track computer activities etc provide the same capability in a broad sense --- i.e. it can be used to do what the patent applicant is seeking to patent. I suspect several such products were in existence prior to 8/30/2011 For example, this product was available June 2011

Siraj K

Posted 2013-08-16T23:34:47.617

Reputation: 11


This sounds similar to a web application firewall.

ModSecurity is such an application that works as a module for Apache web server, and whilst it is intended more as a way to protect systems, than to guide editorial, it does:

  • distinguish between text input and documents submitted via HTTP POST and PUT
  • detect all user submitted content
  • check for 'problematic' words or phrases submitted in documents and forms against a set of rules (that you can also add to, remove or change yourself)
  • alert the user via an error message and code

You can configure alerts from ModSecurity as you like. Out of the box it doesn't really explain to the user what is wrong with what they have submitted, but it would be perfectly feasible to display exactly what rule was broken with a little bit of code.

There are articles and books listed from as early as 2003.


Posted 2013-08-16T23:34:47.617

Reputation: 49


The good-old linux sudo command will say:

username is not in the sudoers file. This incident will be reported.

if the user tying to execute the command is not allowed to do so (i.e. Policy Violation), and a mail is sent to the root user (the third party) as well as informing the user trying to do so. Interestingly, the check is done using a file (i.e. checking against a database of phrases). This is not exactly about phrases in a document, but I think the generalization is trivial.

Amir Ali Akbari

Posted 2013-08-16T23:34:47.617

Reputation: 111


Word has had autocorrect in it for ages, and it's been used in this manner, I don't know when.


Posted 2013-08-16T23:34:47.617

Reputation: 418