Progress Report to the National Science Foundation on Grant BIR-9318160 Parallelization of Individual-Based Ecological Models Submitted by: Louis J. Gross (gross@math.utk.edu) Institute for Environmental Modeling and Mathematics Department University of Tennessee, Knoxville, TN 37996-1300 July 7, 1995 Investigators and their affiliations: L. J. Gross, T.G. Hallam, H-K. Luh and S. Ramachandramurthi Institute for Environmental Modeling and Mathematics Department, UTK M. Leuze, Joint Institute for Computational Science, UTK M. Berry, Computer Science Department, UTK M. McKinney, Mathematics Department, NC State Univ., C. Abbott and J. Dempsey, Grad Students, Computer Science Department, UTK J. Nichols and S. Sylvester, Grad Students, Mathematics Deptartment, UTK H-L. Lee, Grad Student, Ecology Program, UTK Brief description and status of research carried out this past year: There are three major subprojects included in this report, all of which involve the development of parallelization schemes for ecological models which track the behavior of individual organisms, making use of a variety of architectures and platforms. The three components (spatially-explicit rule-based models, structured population models derived from partial differential equations, and spatially-explicit community models) are each discussed briefly below. Spatially-explicit rule-based models track the behavior of individual organisms on realistic landscapes, including rules for foraging, movement, growth, mortality, and reproduction. We have been developing parallel implementations of a model for white-tailed deer and Florida panther in South Florida, coupled to spatially-varying water levels and vegetation models. These have been developed initially using the CM-5 and MP-2, with a focus over the past year on data-parallel procedures for spatial interpolation of the hydrology and vegetation components. Structured population models derived from partial differential equations track the behavior of many cohorts of individuals within a population, where each cohort may include many individuals. We have been developing parallel implementations of a Daphnia model, designed in part to be applied to analyze the effects of toxicants on population levels and structure. The focus of these efforts to date have been on the CM-5 and using PVM over a network of Sparcs, though some runs have been made on the Intel iPSC/860 as well. In carrying this out, a modular approach was used in order to allow rapid development of new algorithms, and application to new machines. A spatially-explicit community model for freshwater mussels has been developed to analyze changes in community composition through time along various reaches of a river/lake system. The model includes age-structured populations of 35 species of mussels along linear arrays of cells representing different spatial cells of a river. This has been implemented on the MP-2, in which each river cell is handled by a single processor element. During dispersal phase, the front-end of the computer is used to combine dispersing larvae from many cells and then redistributing them back to respective processors as appropriate. Parallelization of a Spatial-explicit Individual-based Model: Introduction: The objective of this study is to implement data-parallel procedures for performing simulations of a spatially-explicit, individual-based Deer/Panther Model on single instruction, multiple data stream (SIMD) computers. Simulations of an individual-based model often require data sets that encompass broad spatial scales and extensive historical environmental information. Conventional single processor computers quickly approach their physical limits, in terms of both computational efficiency and data volume. Massively parallel computers can perform operations simultaneously over large spatial data sets, providing elegant and efficient solutions for certain problems in ecological modeling. In this study, we have developed several parallelization strategies on two platforms, the MasPar Model MP-2 and the CM-5 Connection Machine. Parallel Implementations: In order to take the full advantage of the performance of different parallel computers, the selection of parallel strategies depends primarily upon the task to be performed and the parallel architectures. In this project, we can loosely classify the deer/panther model into two levels: information and individual levels. At the information level, environmental data (such as water depths and forage quantities) are obtained by local operations without explicit information about neighbors' distributions. Parallelism can be easily established by partitioning environmental data into many smaller areas which are assigned to parallel processors. Each processor, therefore, expends approximately the same amount of computational effort as other processors. At the individual level, individuals of the deer and the panther submodels collect information provided by the previous processing level and interact with one another based upon the simulation of competition and predator-prey dynamics. Interpolation algorithms used: Bilinear interpolation: spatial data are interpolated by linear distance functions. Knowledge-based interpolation: spatial data are interpolated by expected water depths of vegetation types. Message passing methods used: Local map: each processing element or node carries a whole spatial data set. Partitioned map: control processor partitions the spatial data set and assigns the necessary information to each processing element or node. Figure. Comparison of elapsed wallclock times between sequential and parallel implementations. Data Mapping Algorithms used: To assign each processor spatial data having nearly equal area, we need to consider the parallel architectures. The MasPar MP-2 consists of 4096 processor elements (PE) which are arranged via an 8-way communication network (i.e., a 64 X 64 PE array). The spatial data, therefore, are divided into as many rectangular blocks as there are PEs, with each PE being allocated one logically contiguous block representing a subgrid of the original spatial data set. On the other hand, the processor scheme of the CM5 Connection Machine is a "fat tree"--a binary tree with wider links at higher level. To balance the workload on each of the 32 processing nodes of the CM5, the spatial data are partitioned into groups of complete contiguous rows, so that each processing node contains an approximately equal area. Because of the irregular shape of the study area, the number of rows per node is non-uniform. The row striped partitioning strategy allows for dynamic repartitioning by sending an entire row's contents to a Performance statistics: We have developed and tested two parallel implementations and two spatial data interpolation algorithms on the MasPar Model MP-2 and the CM5 Connection Machine, and compared performances with sequential implementations on a Sun SPARCStation 20. Performance statistics indicate that data-parallel implementations can attain significant speed improvements over sequential implementations in a long-term simulation. For a short-term simulation, 1 or 10 days, there is no significant speed improvement. Clearly, data I/O and message passing cause large overhead within the shorter time frames. Since these two operations perform only once in each run (regardless of the simulation period), they take relatively small proportion within the longer time frames. (See Luh et al. 1995). Parallelization of Individual-Based Structured Population Models Parallelization of a Daphnia population model: In order to develop efficient parallel algorithms, we needed a good sequential implementation of the basic Daphnia population model. Such a sequential program must serve two purposes (i) it must serve as a benchmark against which to compare any parallel implementation, and (ii) by elucidating the structure inherent in the model, it must serve as a convenient starting point for developing parallel programs with minimum effort. A Modular Approach: In order to meet these objectives, we designed a sequential program with a highly modular structure. We defined our elementary unit of simulation to be the simulation of one Daphnia cohort for one discrete time step. We have developed the "Daphnia module" to perform this elementary unit of simulation. Through repeated calls to the Daphnia module, it is a straightforward process to simulate a large Daphnia population consisting of numerous cohorts for any length of time. This is the task of the "driver" module. Our implementation of the Daphnia population model captures the structure in the model rather elegantly. While the Daphnia module simulates the behavior of an individual, the driver module computes the behavior of the population as a whole based on the constituent individuals. Thus we have developed an efficient sequential program that meets both the objectives stated above. Parallel Algorithms: We set two goals for our parallelization effort: (1) we must achieve efficiency in terms of speed-up over the sequential code, and (2) our parallel programs must be easily portable to other architectures that are based on similar programming paradigms. Two different parallelization strategies were attempted. 1. Parallelization by ecotype: The Daphnia population is structured into a finite number of disjoint ecotypes based on the values of three biological parameters. This suggests a simple way to distribute the population across a set of processors. If there are n ecotypes and p processors, then each processor is assigned all n/p ecotypes. However, as the simulation progresses, evolution dictates that only a few ecotypes will dominate the population while the rest gradually die out. This results in uneven usage of the processors. Eventually, the simulation becomes localized to one or two processors while the rest are idle. Therefore, this method did not yield very good speed-ups. 2. Parallelization with load balancing: The cohorts that constitute the initial population are distributed evenly across all the processing nodes without regard to their ecotype. In this approach, our parallel algorithm consists of four different modules, namely, the Daphnia module, the driver module, the load balancing module, and the Input/Output module. A copy of the Daphnia module and the load balancing module reside at each processing node. A specially designated node also serves as the host and it alone executes the Input/Output module and the driver module. At each time step, each node executes the Daphnia module to simulate each cohort in its local population. Then the host executes the driver module in order to assess the changes in the population as result of individual behavior. This entails synchronization of all the nodes and some communication. Since the populations on different processing nodes may grow at different rates, the distribution of cohorts across the nodes can become unbalanced during the simulation. We have devised an efficient parallel load balancing algorithm to remedy such a situation. When necessary, all the nodes execute the load balancing module in parallel in order to re-distribute the population evenly. Implementation and Performance: Originally, we implemented all these algorithms on the Thinking Machines CM-5. They have since been adapted to run on a network of heterogeneous workstations using PVM (Parallel Virtual Machine). The software that we have developed for the CM-5 can be ported easily to the Intel iPSC/860. The PVM implementation can be directly ported and run without any modification on parallel computers such as the IBM SP-2 and the Intel Paragon. The following figure summarizes the performance of our load-balancing algorithm on a 32-node CM-5. The data represents a 150-day simulation of a 20-day old Daphnia population which initially consisted of 267 cohorts. For this simulation, each time step was 1/20 of a day, and the load-balancing algorithm was invoked at the end of every day. The figure shows that the speed-up achieved by our algorithm is very close to the ideal speed-up. http://mathsun1.math.utk.edu:8000/~gross/nsfreport.795.figure1.ps Future Work: We consider our efforts to parallelize the Daphnia population model to be part of a larger mission, namely, the development of efficient algorithms to simulate an integrated food-web model. The next step in this direction is the development of parallel algorithms to implement a predator-prey model in which Daphnia are the prey and fish are predators. The modular approach that we have used to develop algorithms for the Daphnia population model will ensure that these algorithms will serve as valuable components in our future work. Moreover, the methods we used for Daphnia will be applicable to fish models and other species in the food-web as well. A Parallel Spatially-explicit Ecological Community Model: Introduction: The southeastern United States has long been considered to be the "freshwater mussels center" of the world (Williams and Neves, 1992). It has the richest and most diverse number of species of freshwater mussels anywhere in the world. But unfortunately, since the early nineteenth century the freshwater mussel communities have steadily declined in population size as well as species diversity due to anthropogenic activities (Stansbery and Clench, 1974; Bates and Dennis, 1978; Ahlstedt, 1984). Now many species of these living fossils are on the brink of extinction particularly the Unionids. Life-cycle of the Unionids: There are approximately thirty-five extant species of Unionids in the United States, with most species located in the south- eastern region (Dennis, 1984). The Unionids prefer well-oxygenated shallow water habitats with stable coarse sand or sand-gravel mixture beds (Stone et al., 1982; Salmon and Green, 1983; Cooper, 1984; Machena and Kautsky, 1988; Way et al., 1990). The mussels have life-spans ranging from nine years up to ninety years (Stein, 1971; Lewandowski and Stanczykowska, 1975; Burky, 1983; Jirka, 1986). The onset of sexual maturity occurs anywhere from one year to as late as six years depending on the species (Burky, 1983; Pennak, 1983; McMahon, 1991). Environmental parameters and neurosecretory controls trigger the start of the breeding season, normally in spring or early summer. Fertilization and incubation of the fertilized eggs take place in the gill marsupia of the female mussels. The fecundity and incubation period of the mussels are species dependent. Fecundity rate ranges from 200,000 glochidia/female mussel/breeding season to 17,000,000 glochidia/female mussel/breeding season (Young and Williams, 1984; Paterson and Cameron, 1985; Parker et al., 1987). Glochidia, the parasitic larval stage of the mussels, are released by the female mussels, then attached themselves to suitable fish- hosts before dropping off upon completion of their development into juveniles. Figure 1 shows the life-cycle of the Unionids. Methodology: The computer model used in the study is based on the Roughgarden et al.(1985) concept of a single open marine population with space-limited recruitment. Due to the unique life-cycle of the Unionids and their environment, it is necessary to generalize Roughgarden et al.'s original concept in several ways to mimic reality as closely as possible. The Leslie age structure concept is used in the model to structure the mussel population. Each species of mussel in each cell has up to a maximum of 15 different age classes. The physiological growth of the mussels will be described by a deterministic function form that is parameterized individually for each species. The probability of survival from one age class to another is a function of the mussel's species, age and its immediate environment. Reproduction will begin when sexual maturity is reached and occurs at the same time each year. The recruitment/settlement of juveniles on a given cell will be a function of the availability of free space in the cell and the suitability of the habitat. Parallelization of the Computer Model: Due to the nature of the problem and the unique characteristics of the Unionid mussels, it is more suitable to construct a parallel computer code rather than the traditional serial code. The code is written in FORTRAN 90 for a MasPar MP-2 machine. Briefly, the concept of the parallel code is as follows. The river is divided into cells in a linear aray. In each cell, the hydrodynamic and geophysical characteristics of the river is assumed to be uniform. However each cell has its own unique properties which determines its suitability as a habitat for the mussels. As the mussels are sessile organisms, it is safe to assume a mussel in cell i will not migrate to any other cells. But glochidia release by a mussel in cell i can settle in any suitable cell. The distance the glochidia/juvenile traveled is determined by its fish host, but the cell the juvenile settled in is at random. In the computer code, the approach taken was to parallelize the code by river cells. Each cell with its accompanied mussel communities is assigned to a micro-processor. While this simple assignment works, it is not very efficient. This is because some cells are empty (have no mussels) or a relatively small number of mussels compared to others. This leads to a load-balancing problem among the processors and is currently being looked into. As mentioned earlier, mussels are sessile organisms and hence there are minimal or no interactions between the different mussel communities in different parts of the river except during the dispersion/settling of glochidia/juvenile. Therefore in the code, the mussel communities in cell i do not interact with other mussel communities in other cells, which minimizes communication between processors. This is the primary reason for the parallelization of the code by river cells. However, the processors will still have to communicate during the dispersion/settling period of the glochidia/juvenile. Presently, during the dispersion/settling period in time, all information is passed to the front-end of the computer and then passed back to the respective processors after the computations. While not efficient, it is acceptable as this period only occur three times a year. A further improvement is to number the main river branch cells sequentially first followed by the tributaries. Then information is easily passed between processors in the DPU end without having to resort to front -end computation. The numbering system also ensures the distance the information travels between processors is minimize, as the probability of glochidia settling in the main river branch is significantly higher than in the tributaries. In any event, a global router is still necessary to pass information between processors far apart, as when glochidia from main river branch settles in the tributary. This reduces the efficiency of the code. A possible solution, is to ensure each cell in the main river branch which has a tributary is properly assigned to a processor with at least four neighbors, each only a unit away. Hence, it is possible to do away with a global router to pass information between main river branch and tributary. But this method has many severe limitations. An intimate knowledge of the physical location of each cell in the river system is necessary, to allow correct assignment of cell to processor. This makes the code hard to use as well as not being portable from machine to machine. Furthermore an addition of even one tributary into the original river system later will require a complete remapping of all the cells to the processors. A better approach is being sought. Finally, at the end of the simulation all information is sent to the front end for output. References: Ahlstedt, S.A. 1984. Twentieth century changes in the freshwater mussel fauna of the Clinch River (Tennessee and Virginia). M.S. Thesis, University of Tennessee, Knoxville. Bates, J.M., and S.D. Dennis. 1978. The mussel fauna of the Clinch River, Tennessee and Virginia. Sterkiana 69-70:3-23. Burky, A.J. 1983. Physiological ecology of freshwater bivalve. Pages 281-327 in W.D. Russell-Hunter, ed. The Mollusca. Vol. 6: Ecology. Academic Press, New York. Cooper, C.M. 1984. The freshwater bivalves of Lake Chicot, an oxbow of the Mississippi in Arkansas. Nautilus 98:142-145. Dennis, S.D. 1984. Distributional analysis of the freshwater mussel fauna of the Tennessee River system, with special reference to possible limiting effects of siltation. Phd. Thesis. Virginia Polytechnic Institute and State University, Blacksburg. Jirka, K.J. 1986. Reproduction biology and comparative growth rates of selected species of freshwater mussels (Bivalvia: Unionidae) in the New and Greenbriar Rivers, Virginia and West Virginia. M.S. Thesis. Virginia Polytechnic Institute and State University, Blacksburg. Lewandowski, K., and A. Stanczykowska. 1975. The occurrence and role of bivalves of the family Unionidae in Mikolajskie Lake. Ekologia Polska 23(2):317-334. Luh, H.-K., C. Abbott, M. Berry, E. J. Comiskey, J. Dempsey, L. J. Gross. (1995) Parallelization in a spatially-explicit individual-based model (I) - Spatial data Interpolation. (in preparation). Computer and Geosciences. Machena, C., and N. Kautsky. 1988. A quantitative diving survey of benthic vegetation and fauna in Lake Kariba, a tropical man-made lake. Freshwater Biology 19:1-14. McMahon, R.F. 1991. Mollusca: Bivalvia. In J.H. Thorp and A.P. Covich, eds. Ecology and Classification of North American Freshwater Invertebrates. Academic Press, San Diego. Parker, R.S., C.T. Hackney, and M.F. Vidrine. 1984. Ecology and reproduction strategy of a south Louisiana freshwater mussel, Glebula rotundata (Lamarck) (Unionidae: Lampsilini). Freshwater Invertebrate Biology 3:53-58. Paterson, C.G., and I.F. Cameron. 1985. Comparative energetics of two populations of the unionid, Anodonta cataracta (Say). Freshwater Invertebrate Biology 4:79-90. Pennak, R.W. 1953. Freshwater invertebrates of the United States, Chapter 37: Pelecypoda (Clams, Mussels). Ronald Press, New York. Roughgarden, J., Y. Iwasa, and C. Baxter. 1985. Demographic theory for an open marine population with space-limited recruitment. Ecology 66(1):54-67. Salmon, A., and R.H. Green. 1983. Environmental determinants of unionid clam distribution in the Middle Thames River, Ontario. Canadian Journal of Zoology 61:832-838. Stansbery, D.H., and W.J. Clench. 1974 The Pleuroceridae and Unionidae of the North Folk Holston River above Saltville, Virginia. Am. Malacol. Union Inc. Bull. 1973:33-36. Stein, C.B. 1971. Naiad life cycles: their significant in the conservation of the fauna. In S.E. Jorgensen and R.W. Sharp, eds. Proceedings of a symposium on rare and endangered mollusks (Naiads) of United States. Ohio State University, Columbus. Stone, N.M., R. Earl, A. Hodgson, J.G. Mather, J. Parker, and F.R. Woodward. 1982. The distributions of three sympatric mussel species (Bivalvia: Unionidae) in Budworth Mere, Cheshire. Journal of Molluscan Studies 48:266-274. Way, C.M., A.C. Miller, and B.S. Payne. 1990. The influence of physical factors on the distributions and abundance of freshwater mussels (Bivalvia: Unionacea) in the lower Tennessee River. Nautilus 103:96-98. Williams, J.E., and R.J. Neves. 1992. Introducing the elements of biological diversity in the aquatic environment. Pages 345- 354 in R.E. McCabe, ed. Trans. of 57th North American Wildlife and Natural Resoureces Conf., Wildlife Management Inst., Washington D.C. Young, M.R., and J. Williams. 1984. The reproductive biology of the freshwater pearl mussel Margaritifera margaritifera (Linn.) in Scotland I. Field studies. Archiv fuer Hydrobiologie 99:405-422.