Time-series data segmentation.


In the last post, we found that the sales has a seasonal trend. In this post, I'd like to see the detail of the trend.


When we see the seasonal trend (Blue line), it looks that there are peaks in September, November and December. We can assume that there were promotions in those periods or other events. As you can imagine, sales condition can be changed.

The goal is this post is to see how to catch the "Change" with some methods. The purpose is to catch the change is because I feel that we need to extract the data that have the same condition, in order to conduct statistical test. In addition, if we extract data that has higher sales, we can see more precisely what contributes to sales. In other words, I feel this can be helpful to segment data.

Before we start, we will use the preped global_superstore data that is made in this post.

I will use weekly data.


We need there packages.


1. Threshold Auto Regression Model (TAR Model)

This modeling assumes there is a threshold and it determine what state the data is in.

For example, TAR(1) model for 2 states is here:


Let me skip the detail. The essence of TAR model is that, with arbitrary c and s_t, we can describe 2 states which y_t follows.

Especially, if we choose y_(t-1) as s_t, it is called Self-Exciting TAR model (SETAR model).

For example, this is the result of SETAR Model on Sales data.


It is extracting most of peaks in the data. The peaks may extract all timing of promotions, so we may be able to say that the promotion was effective enough to change the sales state.

The code for TAR is here.


However, it failed to catch the peak in 2012. Although we need data of advertisement and promotion to evaluate this result. In addition, as a matter fact, TAR model tends to have an extreme result because it determines the states by the threshold. For example, if there are data points around the threshold, the regime will be much jagged.

2. Smooth Transition Model (ST Model)

There is a solution for the problem of TAR model, that is called Smooth Transition Model.

Following formula is that of 2 states.


Where G(s_t) is called Transition function. This is like a summation of TAR model at each state. In fact, it is obvious that this model includes TAR where G(s_t) = 0 or 1. Typically G() is logistic or exponential, and if G() is logistic and AR model, the model is called LSTAR.

Let's see an example. Now there more regimes than TAR. It looks better than TAR, but in R it seems that there is no way to apply LSTAR for a time-series that has more than 2 states.



3. Markov Switching Model (MS Model)

Markov Model, I haven't finished studying yet so let me just show what we can do this with R.

The steps for MS Model is:

1. Define y_t

2. Define x1_t, x2_t,... that can be explanatory variables but are arbitrary.

3. Estimate linear model y ~ x1_t + x2_t +...

4. Give parameters for msmFit() function then estimate MS model.

As you saw, MS model requires arbitrary explanatory variables. I chose trigonometric functions that look like the trend and seasonal trend.


Source code is here.


Result is here.

It is also split the data into 2 states, that has high sales or low sales. However, when we see the data in 2014, MS model says that all data is in regime 2. It seems to be a result of being affected by the long-term trend.

I demonstrated how to see the detail of the trend in the sales data. Possible next step is to divide data into 2 parts based on this result, and then we may be able to conduct statistical test or obtain better insights about the data.

In the next post, we will extract the data in regime 2 (high sales) and will see what contributes to the sales.

See you.

#R

© 2023 by Actor & Model. Proudly created with Wix.com