Published on 31 October 2024
As of last night at the AGM, I am now a new Committee Member for PythonNZ. This is the local community group for the Python language and ecosystem, here in Aotearoa New Zealand. We hold meetups and host an annual event called KiwiPycon. While I don’t use Python in everything, I do use it almost every day I am on a computer, and it is great to be able to support and be a member of a local tech community.
I gave a talk at the first KiwiPycon I attended, in 2024, on iNaturalist and eBird. You can watch it here.
Published on 31 October 2024
This is a simple form to get all of the eBird hotspots for a region.
Published on 24 October 2024
I’m working with larger datasets from eBird than I am used to, as part of my PhD work involving community science. You can request downloads of eBird data from the website, which gives you a lot of observations that you can then use to do your work. For me - I’m building a dataset of all of Aotearoa’s bird observations in order to build an agent-based model that I can use to track avian bird flu.
I’ve been having issues dealing with the size of the dataset I’ve had - it’s a bit slow to load in. Rather than download another sample dataset on a smaller timescale, I decided to write a function that filters the dataset down using the sampling (checklist) and then the observation (individual species) data from eBird. I spent an hour or so writing a Python file that did this, but ran into memory constraints I could have avoided if I was cleverer with dealing with streams or line-by-line filters.
But rewriting it in R helped.
#' Filter and sample observations data frame
#'
#' This function filters and samples a given `checklists` data frame and filters the `observations`
#' data frame based on the sampled checklists' "SAMPLING EVENT IDENTIFIER". This allows you to run more
#' stuff on a smaller subset of an eBird database export, so you don't need to spend a lot of time waiting
#' for everything to run.
#'
#' License: MIT © Richard Littauer
#'
#' @param checklists Data frame containing checklists.
#' @param observations Data frame containing observations.
#' @param sample_rate Sampling rate; every nth row will be selected from the `checklists`. Default is 1000.
#' @param checklists_output_file The name of the file where the sampled checklists will be saved. Default is "sampled_checklists.csv".
#' @param observations_output_file The name of the file where the filtered observations will be saved. Default is "filtered_observations.csv".
#'
#' @return A list containing two data frames: `sampled_checklists` and `filtered_observations`.
#' @export
sample_checklists_and_observations_dfs <- function(checklists, observations, sample_rate = 1000,
checklists_output_file = "sampled_checklists.csv",
observations_output_file = "filtered_observations.csv") {
# Reduce the sampling size by keeping every nth row
sampled_checklists <- checklists[seq(1, nrow(checklists), by = sample_rate), ]
# Extract "SAMPLING EVENT IDENTIFIER" from the sampled data
sampling_event_ids <- sampled_checklists$sampling_event_identifier
# Filter the observations based on the "SAMPLING EVENT IDENTIFIER" values
filtered_observations <- observations[observations$sampling_event_identifier %in% sampling_event_ids, ]
# Save the sampled checklists and filtered observations to CSV files
write.csv(sampled_checklists, checklists_output_file, row.names = FALSE)
write.csv(filtered_observations, observations_output_file, row.names = FALSE)
# Return the modified data frames
return(list(sampled_checklists = sampled_checklists, filtered_observations = filtered_observations))
}
If this looks like it is a bit of ChatGPT answer - that is because it is. I haven’t found a faster way to help me understand basic things while coding, yet. Google and DuckDuckGo are horrendous at returning valuable data on how to think about simple things. It’s true that ChatGPT is giving me an older dataset, and that there may be newer ways of dealing with this data. But for the level of coding that I am at, I think this is ultimately trivial.
I’m not happy using ChatGPT. But with clear, concise instructions on what I want, it does help me get to a function like this easier.
This work is something that I am doing as part of my continual run through of eBird’s best practices. Up next: really filtering this data better, and possibly using an rShiny or a JupyterNotebook.
Published on 05 July 2024
I am the eBird reviewer for St. Helena, Ascension, and Tristan da Cunha, three separate island archipelagos in the south Atlantic. They each have their own different ecology and biomes, with their own bird life that includes endemics for each group. Ascension and St. Helena are individual islands, with some small islets that often host large bird populations due to the lack of introduced predators. Tristan da Cunha has three main islands - Tristan, Nightingale, and Inaccessible - and then a fourth island much further south, Gough Island, which I sometimes think of as its own separate island group.
Recently, I was going through an old email from last year with my colleague Ian Worley to two other ornithologists, Andy Schofield and Steffen Oppel. Schofield mentioned that there is a marked difference in presence for Kerguelen Petrels at a few seamounts to the east of Gough: Yakhont and Crawford Seamount. I hadn’t heard of either of these seamounts before, and started getting curious.
The following image shows where these seamounts are. This is taken from a paper by Requena et al. 2020, and it helpfully names a few other seamounts - McNish and Zenker, R.S.A., and the Walvis Ridge, while also showing the different pelagic provinces that surround the islands. This also shows why Gough is so different from Tristan da Cunha and the other two islands; it is a few hundred kilometers south, which places it outside of the south central Atlantic Gyre, which means that the water temperature and nutrients are different.
The differences in the water realm is most marked near seamounts - islands which never break above the sea, and which aren’t quite high enough to be reefs. There, water from the depths is brought up, bringing nutrients for plankton, and there is more habitat for different types of fish and other life, which in turn means more food for pelagic birds. Nesting birds on Gough might fly out to these seamounts to feed more preferably, something that Requena et al. discovered by tracking the movements of a few species they tagged on Gough.
The large expanse of water near the islands isn’t necessarily homogenous, and these ranges under the sea help influence life above them. As an eBird reviewer, this is important, because it means that when observers are taking ships to and from the islands, especially on large tours during the seasonal visiting months (normally April), their pelagic observations won’t always have the same qualities. As a reviewer, it’s my job to ensure that entries into the database reflect accurate observations from users, and that they don’t have egregious data entry errors that would make the data less useful to future researchers. If someone saw many more birds in one spot than another, I might think it is suspicious. But knowing that there is a seamount there may help me judge whether an aberrant observation should be included in the database or not.
Recently, Michael Schrimpf at eBird developed a massive mapping system that split up the pelagic High Seas and coastal reviewing areas into more discrete portions, which means that I can build separate filters for each island, and for some pelagic areas around them. Going forward, working in observations near seamounts into the filters may be important, too. An observation of 30 Kerguelen Petrels 100 kilometers to the west of Gough would be far less likely than an observation 100 kilometers to the east of Gough, as there’s no seamount there. That’s interesting to know.
Of course, being an eBird reviewer can be difficult, because I am not on the ground (or on the waves) observing birds, and I am both helping to curate the data as well as using it to improve my understanding of bird movements in the area. My assumptions are important to understand, because when an observation is in conflict with what I think is happening, I have to ask whether I know enough to judge whether an observation is likely to be an accurate, or, at the least, sufficiently documented.
Knowing about these mountains under the sea - some of which have names, some of which don’t - will be helpful. After researching the seamounts near Tristan da Cunha, I looked for a few more seamounts to the north, for St. Helena and Ascension.
I found this map of some seamounts north of St. Helena, towards Ascension. I also found another map of a final seamount, Harris Crawford, far to the west of Ascension. These aren’t exactly terra incognita - this map came from a journal that described new fish found during a fishing expedition (Edwards, 1993), and fisherman are often the first humans to know about seamounts, as they’re much more productive for certain species than the open ocean. That’s one of the reasons many of these seamounts are named.
It’s useful for me to know that these seamounts exist, but it may also be useful for birders who are visiting these areas to know that they’re near one. In order to facilitate that, I’ve set up hotspots for each of the named seamounts. There are many seamounts, and some which don’t reach quite as far up as others. I decided to arbitrarily only include named seamounts in order to limit hotspot proliferation - hundreds of hotspots may not be as useful as just a few for general areas. I also decided not to make hotspots for the Walvis Ridge, as it’s farther from the islands and as it extends all of the way to Walvis, a coastal city in Namibia. Setting one there would be akin to saying “Green Mountain hotspot” for the entire state of Vermont, on just one mountain.
I still haven’t figured out which seamounts lie between St. Helena and Tristan da Cunha, but some sleuthing on bathymetric maps may show some. There also may be some near Trindade, closer to Brazil.
Hopefully, some birders going past these seamounts will select the hotspot location instead of a personal location, and we’ll be able to slowly accumulate an idea of what birds use them throughout the year. For now, there’s nothing to show for the effort of making them - I have to wait and see if they’re used. But that’s one of the fun parts of making hotspots. You create them, and wait.
Of course, I hope that one day I’ll be able to go out and see these locations myself. Until then, I’ll keep going to the south Atlantic in my mind every week as I review observations from others.
Cited
- Edwards, A. J. “New records of fishes from the Bonaparte Seamount and Saint Helena Island, South Atlantic.” Journal of Natural History 27.2 (1993): 493-503.
- Requena, S., Oppel, S., Bond, A. L., Hall, J., Cleeland, J., Crawford, R. J., … & Ryan, P. G. (2020). Marine hotspots of activity inform protection of a threatened community of pelagic species in a large oceanic jurisdiction. Animal Conservation, 23(5), 585-596.
Published on 14 June 2024
I’ve started a new podcast, focusing on open source and the climate crisis. Fundamentally, I think having open models and open source code reduces friction in development, and allows for greater uptake and usage - crucially important factors when dealing with the largest issue humanity has ever faced.
OSS for Climate Podcast
If you have the time, listen, and share.
Tobias Augspurger of opensustain.tech, my colleague on this and the funder for the initial six podcasts, and I have written a blogpost that is now on opensource.net. The text of that is below.
Why Climate Needs Open Source Action
Since 2017, SustainOSS has been a community of people who think about what we can do to make open source software more sustainable. We’ve talked about making better ways of compensate coders, building better communities, and welcoming more diverse voices into the open source ecosystem.
Many of these conversations have taken place on the Sustain Podcast. However, almost none of these conversations have been about the intersection of open source and environmental sustainability.
It’s all over the news that software energy consumption is bad for the climate. What’s rarely talked about, though, is how software actually enables climate science and sustainable technology, especially when it’s open source. It’s the glue that brings together scientists from all disciplines—biosphere, hydrosphere, atmosphere—to create the highly complex collaborative earth and climate models that allow us to forecast what our future might look like if we continue to behave as we have in the past.
This is why we created the OSS for Climate podcast. The is a new initiative within the Sustain ecosystem hosted by Richard Littauer, the main host for the Sustain Podcast, who has recorded hundreds of conversations on this topic. The podcast is a collaboration with OpenSustain.tech, a free community accelerating open and sustainable technology. This podcast aims to fill the gap in discussions about how open source can be a key driver for climate action and sustainability.
Shining a light on those who take action
OSS for Climate highlights individuals and projects, aiming to provide support in terms of funding, sustainability, and onboarding new contributors. It will explore the systematic changes open source can provide for climate action, addressing issues of transparency and trust, and emphasizing the critical role open source plays in our efforts to combat climate change.
We need a place to give a voice to the people behind the projects, to better understand their needs and perspectives on how open source can accelerate action on the climate crisis. Our first study, The Open Source Sustainability Ecosystem, showed us that such interviews are essential to understanding the very nature of open software’s impact in this area. For this reason, we decided to combine the obvious synergies of our study with a podcast to make the interviews accessible to everyone.
That is why we started the Open Source Software for Climate podcast with Richard Littauer. He is the ideal voice to bridge the gap between what sustains the open source ecosystem and how open source can sustain the natural shared ecosystem on which we all depend. As an open source wizard, community builder, passionate birder, soon-to-be Ph.D. ecologist, and interviewer of more than 300 people in the open source community, there is no better host for this.
Comprehensive Coverage
On OpenSustain.tech we’ve listed around 1400 projects with an active community. Almost all of them are relevant to climate change. Even if you just look at climate models, the amount of software needed to make good predictions is significant.
If you include everything that affects the climate, everything that is affected by the climate, and all the technologies needed to adapt to and combat climate change, you end up with a significant number of projects. Some of these are massive projects that are used by tens of thousands of developers; others are dependencies, part of the digital infrastructure that underpins our shared world.
Involving Diverse Stakeholders
Solving a global problem such as climate change requires global collaboration between different fields, making open source the most relevant methodology for combining knowledge and skills from different fields.
This includes not only scientists, but likewise entrepreneurs, activists, politicians and citizens who are empowered by open source to participate in the implementation of solutions. We want to know how open source can change the spread of climate solutions in the world, how climate justice can be archived by enabling access to technology and knowledge for those who are most vulnerable and affected.
Our mission
Our mission is to promote and support the entire open source community in the area of climate and sustainability. To do this, we combined data science based on open source analytics and data mining with ecosyste.ms to discover not just those who are the loudest, but the people who are making significant contributions in the background.
Stay tuned for insightful episodes that bring to light the important work being done in the intersection of open source and climate action. Listen to OSS for Climate for a deeper understanding of how open source software can contribute to a sustainable future. Are you still on fire and want to find out more about how you can join our mission? Do you have a special person or project that you would like us to spotlight? Feel encouraged to contact Richard Littauer directly or the OpenSustain.tech community on Mastodon.