interHist - an interactive visualization for statistically enhanced query structures

© Copyright 2012-2013 Accademia Europea Bolzano

Show explanation

Hide explanation

This is a sample application of interHist, an interactive visualization for statistically enhanced query structures. The demo shows results to a query for noun phrases of Italian. The data is taken from the free PAISÀ Corpus of Italian web texts (www.corpusitaliano.it).
The structure of complex noun phrases in Italian (cf. Renzi, 1991) is approximated by the following query:

  predet? (det | dem/poss pronoun)? adj* noun (adj | verb ending in ti/te/to/ta)?

It yields more than 24 million results that are condensed into the interHist visualization based on the parts-of-speech (see here for details on the tagset) to each concordance line.

The x-axis displays, as stacked histograms, the part-of-speech distributions per token position. According to the linear order of token sequences, the information is placed from left to right. Part-of-speech types are encoded by color. Hovering over a bar in the histogram highlights the respective part-of-speech label in the legend to the right. The total number of results is displayed above the diagram.
The visualization allows for interactive filtering of the data. By clicking on a bar the respective token position is restricted to the selected part-of-speech. The filtered results are visualized as second sequence of stacked histograms next to the primary data.

Tips for using interHist: