Home / Ricerca / Eventi
 13 Dicembre, 2012  15:30 in punto
MOX Seminar

Clustering of functional boxplots for multiple streaming time series

 Elvira Romano, Seconda Università degli Studi di Napoli
 Aula Seminari F. Saleri VI Piano MOX- Dipartimento di Matematica, Politecnico di Milano
Abstract

Data stream mining has gained a lot of attention due to the development of applications where sensor networks are used for monitoring physical quantities such as electricity consumptions, environmental variables, computer network traffic. In these applications it is necessary to analyze potentially infinite flows of temporally ordered observations which cannot be stored and which have to be processed using reduced computational resources. The on-line nature of these data streams require the development of incremental learning methods which update the knowledge about the monitored phenomenon every time a new observation is collected.
Among the exploratory tools for data stream processing, clustering methods are widely used knowledge extraction tools. In this framework a micro-clustering strategy for functional synthesis of streaming time series is proposed. It is a two step strategy which performs at first, an on-line summarization by means of functional data structures, named Functional Boxplot micro-clusters; then it reveals the final summarization by processing, off-line, the functional data structures. The novelty of the proposed strategy, consists a new definition of micro-cluster based on Functional Boxplots and, in defining a proximity measure which allows to compare and update them. Unlike the existent CluStream methods, the method allows to get a finer graphical summarization of the streaming time series. The obtained synthesis will be able to keep track of the dynamic evolution of the multiple streams.


Main References

1. Adelfio G., Chiodi M., DAlessandro A., Luzio D., DAnna G., Mangano G. Simultaneous seismic wave clustering and registration. Computers Geosciences 44, 6069. ISSN: 0098-3004. DOI: 10.1016/j.cageo.2012.02.017. (2012)
2. Aggarwal C. C., Han J.,Wang J., Yu P. S. A Framework for Clustering Evolving Data Stream. In Proc. of the 29th VLDB Conference.(2003)
3. Balzanella A., Lechevallier Y., Verde R. Clustering Multiple Data Streams. In New Perspectives in Statistical Modeling and Data Analysis. Springer. ISBN: 978-3-642-11362-8. DOI: 10.1007/978-3-642-11363-5-28. (2011)
4. Beringer J., Hullermeier E. Online clustering of parallel data streams. Data and Knowledge Engineering, 58(2). (2006)
5. Bi-Ru Dai, Jen-Wei Huang, Mi-Yen Yeh, and Ming-Syan Chen. Adaptive Clustering for Multiple Evolving Streams. In IEEE Transactions On Knowledge And Data Engineering, Vol. 18, No. 9. (2006)
6. Gama J., Gaber, M.M (Eds). Learning from Data Streams: Processing Techniques in Sensor Networks.Ed. Springer Verlag. (2007)
7. Guha S., Meyerson A., Mishra N. and Motwani R. Clustering Data Streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 3, pp. 515-528. (2003)
8. Lopez-Pintado S., Romo J. On the Concept of Depth for Functional Data. Journal of the American Statistical Association, 104, 718–734, (2009).
9. Ramsay J.E., Silverman B.W. Functional Data Analysis (Second ed.).Springer. (2005)
10. Romano E., Balzanella A., Rivoli L. Functional boxplots for summarizing and detecting changes in environmental data coming from sensors. In Electronic Proceedings of Spatial
2, Spatial Data Methods for Environmental and Ecological Processes 2nd Edition. Foggia, 1-3 Settembre 2011.
11. Sangalli L.M., Secchi P., Vantini S., Vitelli V. k-mean alignment for curve clustering, Computational Statistics & Data Analysis, Volume 54, Issue 5, 1 May 2010, Pages 1219-1233, ISSN 0167-9473, 10.1016/j.csda.2009.12.008.
12. Sun Y., Genton M.G.: Functional boxplots. Journal of Computational and Graphical Statistics, 20, 316-334. (2011).