Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.
Original question:
I am working on a spatial time series analysis project. The task is to study the spatial distribution of point features (e.g., crime events, traffic accidents) over time. I aim to find the places with the following characteristics given the spatial distribution of point features across time:
- places with consistently high-level concentration of point features
- places with periodically high-level concentration of point features. “periodic” here might mean that this place only has a great number of point features during special events (e.g., ceremony and the national day)
- places with suddenly high-level concentration of point features
I have used the Kernel Density Estimation method to compute the density of places across the study area through the study timeline. This way, I can get the time series of kernel densities for each location on the map, i.e., a matrix in which rows represent locations and columns denote the time. Then what’s next? How can I statistically find places with a large number of point features but different temporal consistency levels over time? For instance, the following figure shows the spatial distribution of kernel densities of locations in New York City for four continuous periods (in total I have about 15 periods). The red color means the high kernel densities while the green color represents the low kernel densities.
I have tried to use the clustering techniques (e.g., KMeans and KShape) offered by tslearn package in Python to cluster time series of kernel density values of all the locations. But I can only differentiate them somehow visually. Are their any statistical methods to achieve this goal?
Original answer:
I’m not sure if this fits your goal exactly, but you could try discretizing the area to a preset grid. Something along the lines of running the KDE over all the data at time step 1 then integrate the KDE function for each grid square (2D integration) and save that value. Then you can see if box 1, for example, has similar values at timestep 1, timestep 2, etc or run other functions on that data. Choosing the grid size might be a bit tricky, you could have to try a few different ones but it depends on how much precision you need vs performance.