NEWS |
|
October 8, 2007 - Provalis Research releases new version 3.0 of QDA Miner
More...
May 4, 2006 - Provalis Research releases new version 5.1
of Wordstat.
More...
May 26, 2005 - The Global Aviation Information Network (GAIN)
announces the availability of a technical report on technology
demonstrations in which Provalis Research's statistical content
analysis and text mining software has been used for the analysis
of airline safety reports.
More...
|
|
 |
WordStat 5.1
Computer Assisted Text Analysis |
What's New in Version 5.0?
Click here to see changes introduced in version 5.1
DICTIONARIES PAGE
-
New pre-processing option
allows one to create his own text pre-processing EXE or DLL
(sample English porter stemmer and n-grams transformation are
include).
-
A new lemmatization monitoring
dialog allows reviewing substitutions, overriding existing ones
by creating custom substitutions.
-
Disambiguation rules with
Boolean (AND, OR, NOT) and
proximity operators (NEAR,
AFTER, BEFORE) may now be added
to categorization dictionaries (click on thumbnail to see a
screen shot).

- "As shown" level setting allows
setting the categorization level to the way the dictionary tree
is displayed.
- Ability to create unbreakable categories
(overriding the level setting).
- Categorization dictionaries may now
be printed or exported to XML.
- Ability to merge existing dictionaries.
- Improved contextual menus for faster
dictionary editing.
OPTIONS PAGE
- A new option allows one to include cases
with missing values on independent variables (override existing
listwise exclusion).
- Feature to select a variable to weight
cases.
- New threshold to remove items occurring
in more than a specified % of cases.
FREQUENCIES PAGE
Ability to
create files of keyword frequency norms and compare existing
frequencies to previously saved norm files.
An entirely
new keyword retrieval dialog allows one to extract documents,
paragraphs or sentences with user defined combination of keywords.
Retrieved text segments may optionally be tagged using QDA Miner
codes (click on the thumbnails below to see screen shots).

The full categorization
process may now be stored on disk and applied to documents using
a standalone utility program (WS Document Classifier) or an
optional DLL and command line program.
Optional colored
grid lines.
- Included items may be removed temporarily
from further analysis.
- Added TF*IDF column (term frequency
x inverse document frequency).
AUTOMATIC DOCUMENT CATEGORIZATION
- Naive Bayes and k-nearest neighbors
classification methods applied on occurrences, frequencies,
percentage of words, etc.
- Feature selection and feature weighting.
- Crossvalidation methods (leave one out,
n-folds, split sample).
- Batch experiment module and history
charts for model optimization.
- Document classification on single texts,
list of documents or database.
- The classification model may be stored
on disk and applied to external documents using a standalone
utility program (WS Document Classifier) or an optional DLL
and command line program,

Optional colored
grid lines.
- Included items may be removed temporarily
from further analysis.
KEYWORD-IN-CONTEXT PAGE
- The KWIC page may now be detached and
displayed as a stay-on-top dialog.
FEATURE EXTRACTION PAGE
- Phrase finder page has been moved to
the feature extraction page.
- A new Unknown Words finder allows one
to quickly identify misspelled words, acronyms, technical words,
proper nouns and either replace, ignore or assign them to the
categorization dictionary.
CLUSTER ANALYSIS
- Added probabilistic versions of Jaccard
and Sorensen (or Dice) coefficients.
- Added second order clustering of keywords
(based on the similarity of co-occurrence patterns rather than
mere co-occurrences).
- Ability to select a single cluster and
retrieve associated documents.
- New option to hide single item clusters
in dendrograms and multidimensional scaling plots.
PROXIMITY PLOT
- New option to retrieve documents or
text segments containing two specific keywords.
OTHERS
- The document conversion wizard can now
extract text from PDF files.
- Categorization models and classification
rules may be saved on disk.
- "Anchor to floor" lines on
3D charts (MDS and correspondence plot).
- WS Document Classifier, a small standalone
application to apply previously saved categorization and classification
models to external documents.
- Separately sold DLL and command line
versions of WordStat for standalone content analysis and automatic
classification of documents (not available yet).
- Major speed improvements. The table
below provides some speed comparisons between v4 and v5. This
test was performed on a 1.2Ghz Pentium 3 computer.
|
TASK
|
VERSION
4.0
|
VERSION
5.0*
|
SPEED
IMPROVEMENT
|
| Word frequency of 11,314
newsgroup messages (3,249,029 words) |
5m 52s
|
2m 59s
|
x2.0
|
| - with lemmatization
& stop list |
6m 45s
|
3m 11s
|
x2.1
|
| - categorized using
Regressive Imagery Dictionary (RID) |
10m 4s
|
2m 24s
|
x4.2
|
| - categorized using Linguistic Inquiry and Word Count (LIWC) |
10m 52s |
2m 52s |
x3.8 |
| Keyword-in-Context list on 11,314 newsgroups (142 hits) |
2m 49s |
4s |
x42.2 |
* Speed improvements may differ on other computers.
Click here to see changes
introduced in version 4.0
|