SCOT modelling, parallel training and statistical inference Stochastic COntext Tree (abbreviated as SCOT) is m-Markov Chain with every state of a string independent of the symbols in its more remote past than the context of length determined by the preceding symbols of this state. SCOT has also appeared in other fields under somewhat confusing names VLMC, PST, CTW,... for compression applications.Strong theoretical results establishing consistency of algorithms for training SCOT have been derived for stationary time series with mixing (STSM). We introduce conditions for SCOT equivalence to a 1-MC over context set, find its stationary distribution, entropy rate and prove various limit theorems. An efficient software for fitting SCOT is available that can handle large data sets with moderate size (<28) of alphabet. Fast computational approach to SCOT fitting in parallel on a cluster of computers with an appropriate alphabet being chosen during SCOT training. The preliminary compression of the alphabet size by the adaptive software which implements the Ryabko algorithm. Our new SCOT fitting algorithm can potentially choose a subset of words rather than language symbols for training. This follows from the fact that words are interrelated as a network. Processing SCOT models with such vast alphabets would greatly improve discrimination power between different topics and styles and show most discriminating patterns via approximating the SCOT stationary distribution.. Statistical applications to testing hypotheses and prediction of time series arising in a variety of fields ranging from natural language processing to financial modeling. Our studies indicate that SCOT based change-point detection in financial markets can significantly outperform classical GARCH models. This is a very strong claim, and we are continuing to collect a richer set of evidence. If confirmed, this data driven discovery would clearly indicate inadequacy of the classical model and suggest replacing them with ones based on SCOT. In all explored applications we uncover a complex sparse structure of memory in SCOT models that allows excellent discrimination power. In addition, a straightforward estimation of the stationary distribution of SCOT gives insight into contexts crucial for discrimination between, say, different regimes of financial data or between styles of different authors of literary texts.