Topistry: visualization for document analysis

Topistry is an interactive topic-based visual analytics system for multi-theme documents.

Combining computational analysis with interactive visualization becomes more important in visual analytics for the purpose of understanding large-scale textual information. Topic modeling algorithms have been used successfully to attain a high level understanding of document contents by revealing latent topics in document collections. Current topic modeling algorithms, however, suffer in various ways. First, they typically require an explicit number of topics within the document set that simply may be difficult to predetermine. Second, each document could be related to multiple topics depending on its contents. Topic modeling algorithm provides every document’s dependencies to all topics to analyze multitheme of documents, but it is inefficient to analyze the complex multi-topic (i.e. multi-theme in this paper) relation of documents because documents are not explicitly discriminated such as clustering algorithms.

Efficient interactions in aesthetic visualization are implemented to support multi-theme documents analysis with investigation methods including top-down, bottom-up and random approach. Furthermore, the interactive visualization in this system is fully coupled with the topic modeling algorithm, so it maximizes the benefits of the algorithm while users steer the analysis during investigation. To understand the overall structure and detailed relations within a document collection, various text mining algorithms are implemented such as topic modeling, document similarity metrics and summarization.

This project was led by Jaeyeon Kihm and advised by prof. John Stasko