banner-bg-min

WORDSTAT

Content analysis and text mining software for fast and precise processing of large amounts of unstructured information

THE NEW FEATURES OF WORDSTAT 2026

The 2026 release continues WordStat’s commitment to combining transparent, rule-based text analysis with flexible AI-assisted tools. This update expands analytical and interpretive capabilities, refines natural-language querying, and improves model validation. It also enhances interoperability with existing knowledge structures by supporting the import of local SKOS and Turtle taxonomy files. The update also introduces several enhancements that streamline transitions between qualitative, text-analytic, and statistical tasks, and make it easier to review or adjust supporting variables. Improvements to language-based queries, suggestion tools, and filtering options further support the systematic development of of structured classification systems, taxonomies, and dictionaries.

1. Expanded Generative AI Deployment and Data Control Options (version 2026.1)

Building on our existing integration with Ollama, WordStat 2026.1 adds support for LM Studio and allows users to configure up to three custom URLs for private organization servers. These additions give researchers total control over data sovereignty. To understand how these new paths protect sensitive data compared to standard public clouds, read our blog post on Data Privacy and Generative AI.

2. New Analytical Scripts for Discourse and Style Assessment

Five new analysis scripts have been benchmark tested and added to assist in discourse and stylistic evaluation. These scripts measure coherence, confidence, subjectivity, emotionality, and bias in speech transcripts or written documents.

3. Natural Language Queries on Frequencies

The Frequencies page now supports queries formulated in natural language. Users can directly ask questions about words, content categories, or even leftover terms, allowing for a more intuitive and flexible exploration of frequency data.

4. AI-Assisted Item Selection in Suggestions

On the Suggestions tabs of the Frequencies, Topic Modeling and Phrases Extraction pages, users can now apply custom criteria expressed in plain language for the automatic selection of items. This feature simplifies the process of retaining or filtering terms according to relevance, helping to refine vocabularies or identify terms that represent distinct conceptual dimensions.

5. Enhanced AI Querying of Topic Models

AI queries within topic modeling have been expanded to allow analysis of either the selected topic or all topics simultaneously. Queries can also be limited to subsets of words such as displayed words, top terms, or suggested phrases, offering a more focused way to examine how themes and categories are defined across the corpus.

6. Expanded Prompt-Based Item Validation

Building on the 2025 feature that automatically identified true and false positive phrases linked to a topic or a specific phrase, version 2026 introduces a Explain Relationship prompt to identify the rationale for each selected items. This additional step acts both as a safeguard, detecting items incorrectly selected by the model,and as a source of insight, helping analysts understand domain-specific connections they may not be familiar with.

7. Expanded Embedding-Based Suggestions on the Frequencies Page

The Suggestions tab on the Frequencies page has been enhanced. Previously, it displayed only associated words, including synonyms, antonyms, and semantically related terms derived from word embeddings. It now also presents semantically associated phrases, organized into two groups: 1) Phrases containing the target word, and 2) Phrases semantically related but not containing it. This broader view provides a richer context for exploring conceptual associations and identifying meaningful phrase patterns.

8. Expanded Suggestions on the Phrase Extraction Page

The Suggestions tab on the Phrase extraction page now include individual words that are semantically associated or frequently co-occurring with the target phrase. This allows users to identify equivalent or complementary terms as well as acronyms and gain insight into conceptual associations that may inform category development or refinement.

9. Integrated Spreadsheet Editor for Data Variables

A new spreadsheet editor has been added for direct manipulation of numerical, categorical, logical, and date variables. It supports standard clipboard operations, allowing quick assignment or modification of values across many cases. This provides a more efficient way to review and edit project variables.

10. Seamless Access to Simstat for Statistical Analysis

WordStat can now launch SimStat directly from an open project, allowing users to run statistical analyses on numerical or categorical variables without manually switching between applications. The process is fully integrated: selecting the new command opens the current project in Simstat, and quitting Simstat automatically returns the user to WordStat with no additional file handling. This streamlines the workflow for users who rely on both tools for text-derived data and statistical testing.

11. Support for Local SKOS and Turtle Taxonomy Files

WordStat 2026 now supports importing local SKOS and Turtle files, making it easier to incorporate existing vocabularies, thesauri, and taxonomies into your projects, improving interoperability with external standards and simplifying the reuse of industry- and domain-specific classification systems.

12. Multiple Expression Filtering for Phrases

The phrase filter now accepts multiple matching text expressions. For example, one may retrieve all phrases containing any of the terms “table”, “results”, “data”, or “figure”. This feature facilitates the review and consolidation of phrase groups, improving the efficiency of lexical or conceptual organization tasks.

13. Persistent List of Non-Informative Phrases

A new feature allows users to mark non-informative phrases and add them to a project-specific ignore list. These phrases will be excluded from future extraction and suggestion steps. The list persists across sessions and can be edited or cleared at any time, helping maintain clean, relevant, and focused suggestions.

What’s new in WordStat 2025

Generative AI has revolutionized text analysis, but it still faces challenges on more complex tasks. It also has limited scalability, potential biases and high cost, as well as lack of transparency. WordStat 2025 bridges the gap by seamlessly integrating generative AI with existing advanced text mining, NLP, and machine learning techniques, offering researchers complete control. Choose from leading AI engines, customize prompts, and even create your own for tailored analysis. This release also brings major enhancements to existing features and introduces a groundbreaking tool for exploring differences in co-occurrences, delivering deeper insights with precision and ease. A PDF documenting the new features can be downloaded by clicking here.

1. Choice of several AI engines and models.

WordStat 2025 offers the possibility to perform text analysis and transformation using the engine of your choice among six online options (OpenAI, Gemini, Claude. Mistral, Perplexity, or DeepSeek) and one offline option (Ollama). Users may also choose the model that provides them the performance they need at a price that fit their budget.

2. New AI-Powered Text Analysis and Transformation Routines

We evaluated multiple AI-driven features and integrated those that demonstrated strong performance, in some cases surpassing existing WordStat capabilities, though with trade-offs in processing speed and cost. Among the most effective implementations are sentiment analysis, pros and cons extraction, spelling correction, and automatic translation, all of which deliver highly accurate results across most AI engines. Additionally, we introduced AI-assisted readability scoring, lemmatization, segmentation of Asian languages (Chinese, Japanese, Thai, etc.), as well as grouping of Vietnamese monosyllabic tokens.

3. AI Text Extraction and Summarization

The performance of many generative AI tasks tends to decline with very long documents. One effective solution is to break these documents into shorter, topic-focused subdocuments by extracting all segments relevant to a specific theme. The AI EXTRACTION / SUMMARIZATION feature in WordStat enables users to perform such extractions across all cases in a project, saving the results to a new document variable. Beyond simple extraction, this feature also allows users to generate either a general summary of the full document or a topical summary based on a user-defined theme. By narrowing the scope of analysis to specific issues, these focused segments can then be used for more accurate and meaningful follow-up queries. significantly improving the relevance and precision of GenAI responses.

4. AI-Enhanced Naming of Topic Models

While generative AI engines are still unable to perform topic modeling on large datasets, they excel as providing relevant names to the extracted topics. While WordStat did already provide topic names, it is now possible to obtain more descriptive AI-generated names.

5. AI Generated Grouping of Topics

A new AI script can be used to automatically identify grouping categories for extracted topics and then add the theme descriptor to the topic table. Saving the topics in a dictionary ends up with a hierarchical dictionary with themes at the first level, topics at the second level, and word and phrases at the third level.

6. AI Syntactic Classification of Phrases

WordStat 2025 enhances its phrase extraction routine by offering the possibility to use AI to classify extracted phrases based on their syntactic category. Once phrases are extracted, AI can be used to assign them to grammatical classes such as noun phrases, verb phrases, adjective phrases, prepositional phrases, etc. This categorization offers a more structured analysis, allowing users to focus on specific themes or topics, behaviors, processes, emotions, or other elements associated with different syntactic phrase types.

7. AI-Based Named Entity Classification

Traditional named entity recognition struggles with ambiguous terms and context-dependent meanings. WordStat 2025 introduces AI-powered entity classification which takes into account the surrounding context to accurately categorize names, organizations, locations, and more, reducing errors and improving precision in entity classification.

8. Custom AI Scripts for Post-Processing and Analysis

WordStat 2025 allows users to create custom scripts for post-processing table outputs, for applying text analysis or data transformation on the text dataset. With this flexibility, you can tailor the software to meet your specific needs by automating repetitive tasks, applying custom preprocessing operations, or integrating external routines.

9. Monitor AI Token Consumption

Before executing any prompt, WordStat displays a detailed cost estimate based on the selected model and input length. Users can view both an estimated amount of input tokens and associated cost, along with the corresponding cost per million for the output tokens (since it is impossible to estimate how long the output will be). This makes it easy to compare the efficiency and pricing of different engines and models , helping users make informed decisions and avoid costly operations, especially when processing large datasets or running batch analyses.

To further support budget management, WordStat tracks cumulative token usage and costs by models, engines, and projects. This comprehensive monitoring allows you to allocate resources more efficiently, avoid unexpected overages, and maintain full transparency over your AI-related expenses.

10. Explore Co-Occurrence Comparison Across Groups

WordStat 2025 introduces a powerful feature for comparing co-occurrence patterns between groups, revealing subtle differences that traditional frequency-based methods may miss. While frequency analysis highlights general trends, it overlooks how words or topics may be associated in unique ways within each group. This tool helps identify contextual variations, such as how the same term may be linked to different concepts, concerns, issues, or proposed solutions. Comparisons can be made within subgroups of the current dataset or by using previously saved co-occurrence matrices from different datasets. We are confident that this tool will unlock numerous new possibilities and potentially generate the wealth of interesting discoveries for researchers and data analysts.

11. Explore Topic Relation with Interactive MDS Plot

WordStat 2025 introduces an interactive MDS plot that visualizes topic correlations in a two-dimensional map. Topics are represented as points or bubbles of different sizes representing frequencies or measured coherence, with distance indicating correlation strength, closely placed topics are highly correlated, while those farther apart show weaker or negative correlations. Clicking on a topic reveals a correlation table, allowing you to explore related topics in descending order of correlation strength. This feature offers a clear, dynamic way to understand and explore topic relationships.

12. New Topic Diversity Measure

Topic modeling results include a new measure for assessing topic diversity. This measure helps evaluate how varied the topics are within a model, providing a clearer understanding of the range and uniqueness of topics discovered.

13. Improved Dictionary Assistance Panel

The Dictionary Assistance panel used to assess interactions between dictionary entries and evaluate the impact of wildcards now includes the ability to display the frequency of words matching the selected pattern in the current dataset. Additionally, panel entries can be filtered to show only items that are present in the dataset, streamlining the analysis process.

What’s New in Version 2024?

We’re delighted to announce the release of WordStat 2024. This new version includes important speed optimizations as well as several useful features for exploring in greater detail and in a more focused way large text collections. Here are some new and improved features:

1. Optimized Initial Text Processing

In Version 2024, WordStat has significantly enhanced the efficiency of the initial reading and processing of data files, resulting in expedited results. This improvement is particularly pronounced for projects comprising a large number of small documents, with the first operation now completing up to three times faster than in previous versions.

2. Apply Automatic Document Classification Models to Project Data

WordStat 2024 now allows the application of automatic document classification models for data transformation. Found on the Data page, this new feature facilitates the storage of predicted classes in a variable. Users can also opt to store the probability of the predicted class or probabilities of all classes.

3. Improved Language Detection

WordStat 2024 introduces a new language identification classification model that can identify 68languages accurately. It’s measured accuracy is above 95% on very small text segments (six words or less) and above 98% on longer documents. The classification model may be applied to create a language variable in a multilingual project, allowing one to filter and analyze different languages separately.

4. Filter KWIC tables on up to three criteria

A new filter option on the Keyword-in-Context page may now be used to filter the results of the KWIC table on any selected independent variables or on words or phrases appearing before or after the key item.

5. Select or Exclude Paragraphs using Text Filters

An advanced preprocessing feature enabled users to filter in or out paragraphs containing specific words or phrases. Multiple items can be specified and may be preceded or ended with an asterisk to represent zero, one, or several additional characters. One may also select specific numbers of paragraphs before or after the matching paragraph in order to include or exclude the surrounding context. This feature is especially useful to quickly focus the text analysis on specific topics in large datasets without the need to create a new dataset. When analyzing interview or focus group transcripts, it may also be used to remove interviewer’s questions or the moderator’s interventions if their interventions are clearly indicated by specific key strings.

6. Save Text Retrieval Results to a New Project File

When using the keyword retrieval function, a new button now allows one to save the obtained table as a new project file. This includes preserving options from the current project, such as pre- and post-processing settings, and the link to the categorization model. This feature proves especially handy for in-depth analysis of text segments on a specific topic or meeting specific conditions.

7. Copy Graphics’ Data to the Clipboard in Text Format

It is now possible to save data utilized for creating various graphics to the clipboard in tab-delimited format. One may then seamlessly paste the data in another application to generate tables or custom charts.

8. More Ways to Add Items to the Categorization Dictionary

Right clicking on items of a 2D correspondence plot or of a deviation table now allows one to add those items to an existing or a new category. This option is disabled or hidden when the selected item is already present in the current categorization model or refers to a content category.

9. Filtering and Highlighting of Suggested Phrases.

New phrase extraction options have been added to remove or highlight (bold + italic) phrases already in the current categorization dictionary. Additionally, a filtering mechanism has been introduced, allowing users to set a minimum frequency to narrow down suggested phrases.

What’s New in Version 2023?

1. Improved Topic Enrichment

WordStat now adds more relevant phrases to the extracted topics, while also offering improved suggestions for additional phrases. Additionally, it now boasts greater accuracy in identifying false positive expressions, or exceptions, which can be incorporated into the topic model to help disambiguate words associated with contexts unrelated to the extracted topics.

2. Topic Modeling Word Cloud

The comparison panel on the right-hand side of the topic model table now features a newly added word cloud that visually depicts the relative importance of the top words within the selected topic. This word cloud can be customized, copied to the clipboard, or saved to disk in standard graphic formats like BMP, PNG, or JPEG.

3. New integrated text retrieval feature

A new convenient Sample Text panel on the right-hand of the topic grid can be activated to automatically display sentences or paragraphs that match the selected topic. These text segments are presented in descending order of relevance, with topic words displayed in bold, making it easy to understand the essence of each topic and identify key examples that can be used to illustrate it. This powerful tool provides users with a deeper understanding of their data and facilitates more effective communication of their findings.

4. Improved Top Enrichment Speed

Thanks to significant optimization efforts, the topic enrichment process has been dramatically accelerated, resulting in performance gains of up to 10 to 20 times faster than previous versions.

5. Instantaneous Phrase Extraction

Leveraging the power of multicore processing, phrase extraction is now seamlessly integrated with the main text processing, enabling users to access results almost instantaneously. For instance, on a dataset of over 50,000 customer reviews, extracting the 5000 most frequent phrases can now be completed in just 0.4 seconds, compared to the 14 seconds required by the previous version.

6. Importation of 10-K and 10-Q Financial Filings (2023.1)

A new importation routine enables users to import specific sections of 10-K and 10-Q financial filings, and store them separately or merge them into single documents. The extraction routine automatically recognizes the company’s name, time period (quarter and year), and stores them as variables for easy analysis.

7. Export Text Analysis Results to Power BI (2023.1)

WordStat now offers seamless integration with Microsoft Power BI, allowing users to export text analysis results and metadata to Power BI Desktop for interactive dashboards and reports. By exporting text analysis results and metadata to Power BI Desktop, users can create compelling visualizations, gain deeper insights from their data, and easily share their findings with others.

8. Push Co-occurrence Data to Gephi or NetDraw (2023.1).

With the new option available from the Dendrogram page, users can now export co-occurrence data, along with additional information such as frequency and cluster number, to social network analysis software like Gephi and NetDraw. These tools provide powerful visualizations that help users identify patterns and relationships within their data. Gephi offers layout algorithms and interactive features for real-time exploration, while NetDraw provides visualization options for network graphs.

9. Custom Chart Palettes (2023.1)

WordStat 2023 introduces a new feature that allows users to create custom color palettes. This feature provides greater control over the colors used for charts, word clouds, clustering, and other visualizations, enabling users to customize their output to suit their specific needs.

10. Circular Dendrograms (version 2023.2)

The new circular dendrograms use space more efficiently than vertical ones. In a vertical dendrogram, the height of the dendrogram can become quite large, requiring more vertical space to display. Circular dendrograms, on the other hand, arrange the branches in a circle, making more efficient use of space. Circular dendrograms may be more aesthetically pleasing and can be more visually appealing in presentations and publications.

11. Topic clustering and inter-topic similarity information (version 2023.2)

A new topic modeling option allows one to perform cluster analysis on a topic solution. An additional column will display the top 3 most similar topics. It will also order topics so that similar topics will tend to be close to each other.

12. Topic correlation table (version 2023.2)

The topic statistics dialog box now includes a topic-topic correlation table allowing one to assess similarity between topics. Columns can be sorted to order topics in ascending or descending order of similarity and an optional heatmap can highlight related items in shade of green (positively correlated) or red (negatively related).

Introduced in Version 2022

1. Highly optimized topic modeling with factor analysis

In WordStat 2022, we implemented a new multithreaded factor analysis routine that is up to 65 times faster than prior versions. It means that large problems that would have taken an hour to compute can now be obtained in less than a minute. We were also able to increase the factor analysis capacity to 10,000 words (from 3,000 in prior versions).

Our own research efforts have shown that topic modeling using factor analysis produces topic solutions that are more coherent as well as more diverse than topic modeling techniques relying on LDA and neural network techniques (Peladeau & Davoodi, 2018; Peladeau, 2022). It also has the additional benefit of being stable, yielding identical results every time. However, its main inconvenience has always been its speed and capacity. This brought us to implement in WordStat 8, a special topic extraction routine using non-negative matrix factorization (or NMF). This technique yields much more quickly results that are quite similar to those obtained using factor analysis. However, its probabilistic implementation causes results to differ slightly from one run to the other, which some researchers find somewhat disturbing. It is important to note that almost all other popular topic modeling techniques in computer sciences produce topic solutions that are even more unstable than our custom implementation of NMF. The much-improved speed and capacity of the new factor analysis topic modeling routine will likely be appreciated by those looking for optimal and stable topic solutions.

2. Improved suggestions on the Frequencies page

The Suggestions panel in prior versions of WordStat displayed synonyms, antonyms, and related words for languages for which a thesaurus was available. It also presented words starting with the same initial letters, allowing one to identify some misspellings as well as related words. A new Associated Words section now retrieves from the text corpus other words semantically, syntactically, and statistically related to the selected word(s) in the frequency table. This new feature should work in any language. Entries will be listed, by default, in descending order of relevance. Synonyms, antonyms, and related words will also be sorted in descending order of relevance, facilitating the identification of appropriate suggestions. One is still able to sort those entries alphabetically or in descending order of frequency. Also, a new frequency filtering option lets one to filter out low-frequency suggestions, allowing one to focus on more frequent suggestions.

Since this new way of extracting related words and ordering suggestions is language-independent, it will be especially useful for people analyzing languages for which there is no thesaurus. Yet, we found that even when such linguistic resources are available, the additional suggestions based on the contextual use of words, and the sorting of existing synonyms and related words by relevance should greatly facilitate the identification of appropriate items.

3. New suggestions tab for the phrase extraction routine.

The Overlap panel has been replaced with a Suggestions panel, displaying phrases semantically, syntactically, or statistically related to the selected row(s) in the phrase frequency table, in addition to the overlapping phrases. This feature is also language-independent.

4. Improvement to Named Entity Recognition.

A new Related panel has been added to the Named Entity Recognition page. Selecting, a single named entity will bring related named entities, as well as those belonging to the same class (people, place, organization, etc.). Selecting more than one example of a specific class (for example, several cities) will also retrieve more items belonging to this class. A contextual menu also allows one to move any item to the categorization dictionary or to the exclusion list. A keyword-in-context search may also be performed on selected suggestions.

5. Highlighting of contextual words in keyword-in-context tables.

When assessing words in a categorization dictionary or candidate for inclusion one often needs to look at the presence of additional keywords in the context of the appearance of a target word or phrase. A new highlighting feature allows one to specify a list of words and phrases to look for in the surrounding context of the word. This list is automatically populated when the KWIC list is called from the topic modeling or from the dendrogram or when assessing items in a content category containing multiple entries.

6. Filtering items in correspondence plots on frequency or distance from the origin.

Correspondence plots of more than a few hundred items may create a dark mass of overlapping items at the center of the plot (the origin). A new slider control has been added to hide items that are either less frequent or close to this origin. Unless one wants to identify what is common to all classes of an independent variable, the most interesting items are those that are far from the origin since they are the ones that characterized different classes. Filtering out those items allows one to identify differentiating items more easily.

7. Improved keyword retrieval

Results of a keyword search are now sorted in descending order of relevance, taking into consideration both the frequency and variety of matched items in relationship with the length of the retrieved text segment. A new frequency column can also be used to sort on frequencies only.

8. Computation of a string variable by concatenation

A new data transformation command allows one to compute a string variable by concatenating values of several existing variables (number, strings, dates, etc.) as well as typed text. Such a procedure may also be used to initialize a string variable with a constant string value.

9. Persistent comparison chart settings

The chart type and statistics as well as the color palette of these comparison charts are now linked to the variable name and are stored in the project settings. Those options should remain constant across pages (frequencies, phrases, topics modeling, dendrogram, etc.) and between sessions reducing the need to constantly readjust those options.

What’s New in Version 9.0?

1. Full Unicode Support

We always try to select language-independent text analytics techniques. This has allowed users to analyze text data in more than 50 languages. However, to analyze languages not supported by their default Windows installation, the user needed to change some Windows settings. And while it was possible to analyze datasets in multiple languages, some combinations of languages were simply not possible. The new Unicode version of WordStat allows one to analyze any of these without any setting changes as well as new languages previously not supported such as Chinese, Japanese, or Thai. Word segmentation routines for the previous three Asian languages have also been added.

2. Integration of R and Python Pre- and Post-Processing Scripts

In 2018, we introduced the possibility to create Python preprocessing scripts to WordStat 8. Version 9.0 extends this capability by offering the possibility to create preprocessing scripts in R as well. More importantly, it is now possible to create post-processing scripts in those two programming languages allowing one to perform custom analysis on the original or transformed text data or on quantified results obtained through content analysis on those documents. Such a feature offers endless possibilities to extend the features of WordStat such as implementing new machine learning algorithms, advanced statistical modeling techniques, or custom data transformation. Sample scripts have been included to compute text readability metrics, detect languages, apply other topic modeling techniques (LDA or STM) or create predictive models using machine learning (SVM, kNN, etc.).

3. Automatic Spelling Correction

A new spell-checking engine has been written from scratch to achieve much faster and more accurate spelling corrections, allowing the implementation of an automatic spelling correction feature with minimal impact on the existing text processing speed of WordStat. The intelligent spelling correction can even correct the spelling of unknown terms such as technical vocabularies, proper nouns, etc. Results can be automatically saved to the substitution list for revision and corrections.

4. Crosstabulation with Charting Panels and Filtering

The crosstab page now includes a chart panel allowing one to quickly plot the distribution of selected rows of the crosstabulation table for the values of the currently selected variable or any other variable. A filtering list box also allows one to analyze such distributions for a single value or a set of values of the selected variable.

5. Interactive Co-occurrence Matrix

A new interactive matrix feature has been added to the co-occurrences page allowing one to focus on specific co-occurrences. The main results consist of a table displaying a choice from various co-occurrence statistics. Such matrix is also highly interactive allowing one to transform specific rows into new columns or vice versa using simple drag-and-drop operations. A charting panel on the left also allows one to assess the distribution of a specific co-occurrence across other variables. One may also obtain a quick view of all text segments associated with a specific co-occurrence. This new feature of WordStat may also be called from the frequency list by selecting target items (words or content categories) that should be displayed as columns, right-clicking, and selecting Co-Occurrence Matrix.

6. Importation of Nexis UNI and Factiva Files

Introduced in QDA Miner 6.0 in 2020, it is now also possible in WordStat to import news transcripts from the LexisNexis and Factiva output files. After selecting one or multiple .DOCX or RTF files obtained from those services, WordStat will extract and store in separate variables the title and body of the news transcript, its source, the publication date, and other relevant information. Such a feature should prove useful for reputation management, brand management, crisis communication, media framing analysis, comparative media studies, etc.

QDA Miner can import factiva

7. Batch Processing of Topic Models

Choosing the number of topics to extract using topic modeling techniques remains a question for which there is, to our knowledge, no definitive answer. We may even raise doubts as to whether such an optimal number exists. In fact, one may even suggest that information obtained using different settings may well serve different purposes or reveal different aspects of a reality. In such a context of uncertainty, researchers often want to compare various solutions. The new batch processing feature allows one to compute multiple topic models by systematically varying the number of topics to extract, and for the probabilistic method (e.g. NNMF), to perform several runs using the same settings in order to assess the stability of the results. All topic model solutions are temporarily aggregated in the report manager allowing one to compare solutions obtained in multiple runs using different settings.

8. Create Word Clouds on Keyword Retrievals & KWIC Results

Interactive word clouds and word frequency tables can now be obtained directly on keyword retrieval and keyword-in-context (KWIC) results allowing one to quickly identify words associated with specific content categories, or those appearing, before, after a specific target item.

9. More Powerful Proximity Rules

The number of conditions in proximity rules has been increased from four to a maximum of twenty conditions. If you believe it is not enough, let us know.

10. Preview Effect of Wildcards and Dictionary Interactions

Using wildcards in a dictionary is quite powerful yet potentially troublesome since it could match items that you may not have thought of. For example, an entry like TAX* may allow you to match TAX, TAXES, TAXATION, but will also match words such as TAXI, TAXONOMY, TAXIDERMY, etc. Also, WordStat rules for matching items and preventing double-counting may also produce unexpected results caused by other entries in your categorization model. A new panel on the right of the exclusion and categorization pages allows you to easily identify new entries that would be matched using a *wildcard at the end of a word but also of possible conflicts with other entries in your dictionary.

11. Password protection of project files

WordStat 9.0 now offers the possibility to password protect project files, restricting the access of specific projects to authorized users. A dialog box allows the project administrator to create new user accounts and specify which operation each user can perform. One may limit data editing, data importation, or transformation, as well as exportation of project data, tables, and graphics. Alternatively, you may choose to let the users perform any transformation they want but prevent them from saving the project file.

12. New Options for Cleaning Data

The preprocessing page now includes options to automatically remove URLs from text messages as well as speakers’ designations in news and interview transcripts.

13. New Stacked Area Charts

The charting feature of the Crosstab page adds the possibility to create two types of stacked area charts.

14. Colored Items in Correspondence Plot

Color gradients may now be used to represent the position of specific items or variable classes on the third (depth) dimension or 2D and 3D correspondence plot. Up to four colors may be chosen to create those gradients.

15. Improved Bubble Chart

It is now possible to transpose rows and columns of bubble charts.

16. Link Analysis Buffer

A link analysis buffer allows one to move back to previous link diagrams and then forward.

17. Faster and More Precise Topic Enrichment

WordStat goes beyond typical topic modeling, offering ‘a unique topic enrichment feature that identifies associated phrases, potential exceptions, and misspellings. It also generates relevant topic names automatically. With version 9, this topic enrichment feature is now twice as fast as before and performs better word-sense disambiguation for a more accurate list of exceptions. It also provides better suggestions for spelling corrections.

18.Improved Speed and Accuracy of Existing Spelling Corrections

The existing spelling correction feature is now up to 30x faster, requiring only a second or two to suggest spell corrections for tens of thousands of unknown words.

19. New .PPRJ File Format

A new file format with a new file extension (.pprj) was created, providing improved support for Unicode data. However, WordStat 9 retains backward compatibility with the prior versions of all our software and can open and analyze current project files (.ppj) created by QDA Miner, SimStat, or older versions of WordStat.

20. Numerous Additional Improvements

Several additional options and interface improvements have been made to existing dialog boxes, graphics, data management, and data analysis features.

New features of WordStat 8 can be viewed here