Provalis Research Text Analytics Software
 
WORDSTAT
Description
List of Features 
What's New in v6.1?
Screen shots
Dictionaries
Studies
Download
Purchase

NEWS

May 8, 2012 - Provalis Research releases three three sentiment analysis dictionaries in WordStat format.
More...

December 12, 2011 - Provalis Research releases new version 4.0 of QDA Miner qualitative data analysis software
More...

August 30, 2010 - Provalis Research announces the release of a WordStat 6.1 content analysis and text mining software
More...

April 21, 2010 - Provalis Research announces the relese of WordStat 6 content analys and text mining software .
More...

content analysis and text mining software

Content Analysis and Text Mining Software v6.1
Content Analysis & Text Mining Software

LIST OF FEATURES


TEXT PROCESSING CAPABILITIES

  • Content analysis on collection of large documents and short alphanumeric variables (up to 255 characters).
  • Dictionary moderated lemmatization and stemming (English, French, Italian, German and Spanish; contact us for other languages).
  • Ability to call external text pre-processing EXE or DLL (sample English porter stemmer and n-grams transformation are include)
  • Optional exclusion of pronouns, conjunctions, etc, by the use of user-defined exclusion lists (or stop list).
  • Categorization of words or phrases using existing or user-defined content analysis dictionaries.
  • Word categorization based on Boolean (AND, OR, NOT) and proximity rules (NEAR, AFTER, BEFORE)
  • Word and phrase substitution and scoring using wildcards and weighting.
  • Frequency analysis on keywords, phrases, derived categories or concepts, or user-defined codes entered manually within a text.
  • Interactive development and easy maintenance of hierarchical content analysis dictionaries, taxonomies, or categorization schema.
  • Drag and drop editor for easy assignments of words, phrases into categories.
  • Ability to restrict the analysis to specific portions of a text or to exclude comments and annotations.
  • Ability to perform a word frequency analysis on a random sample of cases (useful for large projects).
  • Integrated spell-checking with support for more than 20 languages such as English, French, Spanish, etc.
  • Integrated thesauruses to assist the creation of taxonomies and comprehensive categorization schemas (English, French, Spanish, Italian, Portuguese and German).
  • Powerful case filtering on any numeric or alphanumeric field and on code occurrence.
  • Prints presentation quality tables and graphics
  • Imports ANSI and Unicode text files, MS Word, WordPerfect, RTF and HTML, PDF.
  • Exports any table to Excel,SPSS, ASCII, Tab separated or comma separated value files, or HTML files.
  • Flexible keyword highlighting (the text editor can display all categories using different colors).

UNIVARIATE WORD FREQUENCY ANALYSIS

  • Univariate word frequency analysis (word or category count and record occurrence).
  • Word x word co-occurrence analysis matrix.
  • Word x case data matrix.
  • Word clustering for automatic identification of main themes and topics.
  • Integrated multidimensional scaling with 2D and 3D maps.
  • Proximity plot.

    word frequency analysis  word clustering   concept mapping  

        Analysis of word co-occurrences with proximity plot  

VOCABULARY AND PHRASE EXTRACTION

  • Vocabulary finder extracts technical terms, product and company names as well as common misspellings.
  • Phrase extraction tool allows one to easily identify recurring phrases and idioms

    phrase frequency analysis

NORM CREATION AND COMPARISON

  • Ability to create norm files based on word frequency analysis or on frequencies of content categories.
  • Comparison of concept or word frequencies to previously saved norm files.

KEYWORD RETRIEVAL FUNCTION

  • A powerful keyword retrieval function allows identification of text units (documents, paragraph or sentences) containing one keyword or a combination of keywords with optional filtering of cases.
  • Ability to attach QDA Miner qualitative codes to retrieved text segments.
  • Retrieved segments may be exported to disk in tabular format (Excel, SPSS, or delimited text files) or as text reports (Rich Text Format).

           

KEYWORD CO-OCCURRENCE ANALYSIS

  • Integrated clustering and dendrogram display of keyword co-occurrence.
  • First- and second-order proximity analysis.
  • Proximity plot to easily identify all keywords co-occurring with a target word or content category.
  • 2D and 3D multidimensional scaling on either joint frequency or co-occurrence of words or categories.
  • Flexible keyword co-occurrence criteria (within a case, a sentence, a paragraph, a window of n words, a user-defined segment) as well as clustering methods (first- and second-order proximity, choice of similarity measures).
  • Easy text retrieval from dendrogram or proximity plots allow one to drill down to the original sources.

CASE OR DOCUMENT SIMILARITY ANALYSIS

  • Hierarchical clustering, multidimensional scaling and proximity plot may be used to explore the similarity between documents or cases.

MULTIPLE RESPONSES AND COMPARISONS

  • Univariate word frequency analysis and crosstabulation on information stored in several text fields.
  • Comparison of keyword frequency or occurrence between variables.
  • Computes inter-raters agreement measures (pct. of agreement, Cohen's Kappa, Scott's Pi, Krippendorff's R and r-bar, free marginal) based on codes manually entered in different variables.

COMPARISONS BETWEEN SUBGROUPS AND TEMPORAL TREND ANALYSIS

  • Comparison between any textual field and any nominal or ordinal variable (such as gender, age groups, etc.,),
  • Automatic transformation of date variables into week days, months, quarters of year, years, or decades for identification of temporal trends.
  • Choice between 11 different association measures to assess the relationship between word occurrence and nominal or ordinal variables (Chi-square, Likelihood ratio, Tau-a, Tau-b, Tau-c, symmetric Somers' D, asymmetric Somers' Dxy and Dyx, Gamma, Person's R, Spearman's Rho)
  • Computation statistics on either absolute or relative frequency
  • Ability to sort matrix in alphabetic order of words, by word frequency or word occurrence, on the obtained statistics or on its probability.
  • Visually compare items between subgroups using bar charts and line charts.

                 

  • Correspondence analysis (statistics, 2D & 3D joint plots). This feature is accessible from the crosstab page and allows one to see graphically the relationship between nominal variables and codes resulting from a content analysis.
  • Heatmap plot (with dual-clustering of keywords and variables)

            

AUTOMATED DOCUMENT CLASSIFICATION

  • Machine learning algorithms (Naive Bayes and K-Nearest Neighbors) for automatic document classification.
  • Flexible feature selection for automatic selection of best subsets of attributes.
  • Numerous validation methods (leave-but-one, n-fold crossvalidation, split sample).
  • Experimentation module allows easy comparison of predictive models and fine-tuning of document classification models.
  • Document classification models may be saved to disk and applied later using either a standalone document classification utility program, a command line program or a programming library . Note: The command line and the programming library are part of WordStat Software Developer's kit (SDK) which is sold separately.

            

KEYWORD-IN-CONTEXT (KWIC)

  • Ability to display a KWIC table to examine the textual context of a word, word pattern, or content category.
  • Ability to sort the KWIC table on any independent (numeric or categorical) variables.
  • Ability to jump to the document in order to view the full context or edit the original document.
  • KWIC list can be saved in data files (Excel, SPSS or delimited files) for further processing.
  • Customizable KWIC display (paragraph, sentence or user defined segment).
  • Concordance report (displays all hits as a list of paragraphs, sentences or user defined segments)

       

FULL INTEGRATION WITH A STATISTICAL SOFTWARE AND A QUALITATIVE ANALYSIS SOFTWARE

  • Alphanumeric variables can be stored in the same file as all other numeric and categorical variables.
  • Variable selection, statistical analysis and content analysis are performed within the same application program.
  • Concept, keyword or word frequency analysis can be transformed into into numerical or polynomial variables in the existing project or exported to disk into a new data file (Excel, SPSS, delimited files, etc.) for further statistical analysis (such as factor analysis, multiple regression, time series and other predictive modeling techniques, etc.).
  • Ability to perform numeric and alphanumeric transformation or to apply filters on records of the data file to restrict the analysis to specific subgroups. .

UTILITY PROGRAMS

  • Dictionary building assistant to find related words (synonyms, antonyms, holonyms, meronyms, hypernyms, hyponyms) in a WordNet based thesaurus (English only). (100,000 synonyms, 120,000 root words)

          

  • WS Document Classifier, a small standalone application to apply previously saved categorization and classification models to external documents.
  • Document Conversion Wizard- Utility program to easily import documents. Various file formats may be directly imported such as:

    • Plain text (ANSI, Unicode) HTML, RTF, MS Word, WordPerfect, Adobe PDF
    • Optional removal of leading and trailing spaced and hard returns in text files.
    • Extraction of numeric, alphanumeric and date variables from structured documents.
    • Extraction options may be saved on disk and later retrieved.