Jul 2006
Writing is hard: anyone who claims different has never attempted serious writing. Technical writing is difficult owing to the balance of technical information versus good (or passable) material. Writing using only a basic ascii editor such as vi is daunting because of an apparent lack of tools to assist the writer. Although a real lack of built in tools for editors like vi exists, certainly no shortage of tools and utilities are available to help. A particular set of writing tools are diction and style.
Diction and style do not follow a pure set of rules for grammar checking, instead, they follow a set of algorithms which drive towards heuristic correction. Diction definitely pushes towards more correct, but not exact, writing and is accurate but not precise. Conversely, style grades words, sentences, paragraphs and ultimately an entire body based on a set of grading algorithms. None of the style algorithms by themselves may offer exactly what a writer wants but they are precise relative to each algorithm. Coupled together, diction and style effectively help the writer find a more correct vector in which to push material. The ultimate reward of using tools like diction and style is the more one uses them the more correct first drafts become over time. The same cannot be said for most word processing tools.
The results of diction and style do have some pecularities, for instance,
diction alone is not well suited for overly technical material simply
because it is design for common writing. Not all of the
algorithm grades from style apply for certain topics; indeed they
may be thrown out completely depending upon the subject. Most technical
writing is only concerned with 3 or 4 of the algorithms for example.
Using external tools also presents the problem
of not
editing in place. A file must be processed by diction and style
outside of the the
editor, it can be edited
at the same time (say in another terminal window).
A little practice demonstrates how easily diction and style can help.
The tools can be obtained via a package manager or directly from GNU's Free Software Directory. The build and install process is the same as most GNU packaged software:
tar xzvf diction-X.XX.tar.gz cd diction-X.XX/ ./configure make (as root or sudo) make install
To understand how diction and style work, an example is needed. Following is some purposefully flawed text:
This is some text, some of this text is easy to read and may well help you comprehend the very very difficult tasks that lay ahead regarding your new system.
The example is frought with some obvious errors, following is the output of diction after one pass:
test:1: This is some text, some of this text is [easy -> (weak definition)] to read and [may -> = Do not confuse with "can".] well help you comprehend the very -> (use sparingly; try to use words that are strong in themselves for emphasis)] [very -> (use sparingly; try to use words that are strong in themselves for emphasis)] difficult tasks that [lay -> A transitive verb, not to be confused with the intransitive verb "lie". You "lie" down, and you "lay" an egg. However, note that the past tense of ``lie'' is ``lay'': Yesterday, I lay down and laid an egg.] ahead regarding your new [system -> Frequently used without need.].
The output of diction can be daunting, breaking up the example is easier:
test:1: This is some text, some of this text is [easy -> (weak definition)]
What diction is saying is that the sentence fragment is not a strong sentence. In most cases, when a weak defintion is found rewording or tossing out the sentence is the best solution.
[may -> = Do not confuse with "can".]
Good advice when it applies, does it help? In rereading the sentence
using may and well so closely does not
sound right.
very -> (use sparingly; try to use words that are strong in themselves for emphasis)]
Good advice, the sentence does not need very in it at all.
[lay -> A transitive verb...
Straightforward fair warning - do not use it inless it actually means the definition:
[lay -> A transitive verb, not to be confused with the intransitive verb "lie". You "lie" down, and you "lay" an egg. However, note that the past tense of ``lie'' is ``lay'': Yesterday, I lay down and laid an egg.]
So once again the text would likely be better served without it.
ahead regarding your new [system -> Frequently used without need.].
The system warning can be ignored, remember, diction
is working on a common grammar base and not technical material alone. A more
succint sentence:
echo "Within this simple text is information to help setup a system." \ | diction (stdin):1: Within this simple text is information to help setup a [system -> Frequently used without need.].
Running style returns a variety of results of readability tests. The ones this site looks at closely are the Kincaid formula developed for military training manuals, the Flesh formula and the fog/smog indexes. The reason a few are used is because of the technical nature of the material. The manual page describes all of the results. Here is some sample output:
[jrf@vela:~$] lynx --dump http://systhread.net/texts/200607subver.php | style
readability grades:
Kincaid: 9.5
ARI: 11.4
Coleman-Liau: 12.9
Flesch Index: 62.0
Fog Index: 13.2
Lix: 48.6 = school year 9
SMOG-Grading: 11.9
sentence info:
5760 characters
1181 words, average length 4.88 characters = 1.48 syllables
60 sentences, average length 19.7 words
55% (33) short sentences (at most 15 words)
16% (10) long sentences (at least 30 words)
24 paragraphs, average length 2.5 sentences
0% (0) questions
70% (42) passive sentences
longest sent 73 wds at sent 38; shortest sent 4 wds at sent 33
word usage:
verb types:
to be (56) auxiliary (22)
types as % of total:
conjunctions 5% (58) pronouns 2% (20) prepositions 8% (98)
nominalizations 2% (25)
sentence beginnings:
pronoun (2) interrogative pronoun (0) article (17)
subordinating conjunction (2) conjunction (0) preposition (6)
As per the man page, the Kincaid grade is geared towards technical documents and grades difficulty from 5.5-16.3. 14 is around where an author might be concerned about the material if it is meant to be digested by a large audience.
The flesh formula grades reverse on a scale of 0-100 and is targeted towards school texts ranging from grades 3-12. Flesh by itself it may not prove useful, but it helps to a degree, for instance, a high result might mean that the technical material is fine but general readability is difficult.
The fog index is another school grade and generally a 12 or above is too hard. A high fog index is not neccessarily bad, in the example it illustrates that someone with no concept of the topic or related topics cannot understand it - which makes sense. If the topic had been about the color to paint a shed, then the author should be concerned. The SMOG grading is a straight to school grade level requirement to read. A value of 11.9 is actually good considering the subject material.
The key to using style is looking for better scores for the type of material the author is writing. Not all scores will ever be great, generally scoring decently on at least half of the scores an author is concerned with does the job.
While the ascii editing world does not have straightforward grammar
checking tools it does have non intrusive tools like diction
and style exist to point writers in the right direction.
(based on last 2 months log reports)