About

Log in?

DTU users get better search results including licensed content and discounts on order fees.

Anyone can log in and get personalized features such as favorites, tags and feeds.

Log in as DTU user Log in as non-DTU user No thanks

DTU Findit

Journal article

Modeling a non-stationary bots’ arrival process at an e-commerce Web site

From

Institute of Mathematics and Informatics, Opole University, ul. Oleska 48, 45-052 Opole, Poland1

Faculty of Electrical Engineering, Automatic Control and Informatics, Opole University of Technology, ul. Prószkowska 76, 45-758 Opole, Poland2

The paper concerns the issue of modeling and generating a representative Web workload for Web server performance evaluation through simulation experiments. Web traffic analysis has been done from two decades, usually based on Web server log data. However, while the character of the overall Web traffic has been extensively studied and modeled, relatively few studies have been devoted to the analysis of Web traffic generated by Internet robots (Web bots).

Moreover, the overwhelming majority of studies concern the traffic on non e-commerce websites. In this paper we address the problem of modeling a realistic arrival process of bots’ requests on an e-commerce Web server. Based on real log data for an online store, sessions generated by bots were reconstructed and their key features were analyzed, including the interarrival time of bot sessions, the number of HTTP requests per session, and the interarrival time of requests in session.

To deal with the problem of non-stationarity of the Web traffic, chunks associated with times of day were distinguished based on the intensity of bot sessions’ arrivals and then features of sessions in individual time chunks were analyzed separately. Using regression analysis, a mathematical model of the bots’ traffic features was developed and implemented in a bot traffic generator.

Our findings confirm the existence of a heavy-tail in bot traffic features’ distributions. The bots’ session interarrival times and request interarrival times are best modeled by a Weibull and a sigmoid distributions, respectively, while the model proposed for the numbers of requests per bot session is based on a hybrid function being a combination of one exponential and two normal distribution functions.

The suitable fit of the model was confirmed by the high correlation of the real and model data. Furthermore, a visual inspection of the simulation results showed that the estimated values represent distributions close to those of the empirical data..

Language: English
Year: 2017
Pages: 198-208
ISSN: 18777503 and 18777511
Types: Journal article
DOI: 10.1016/j.jocs.2017.05.017

DTU users get better search results including licensed content and discounts on order fees.

Log in as DTU user

Access

Analysis