I know that I have not posted any posts here in a long time, even though there has been some new releases of the Stats Page, Graphs, Reports and API sites since my last post. Even though development has been continuing, I haven’t had a lot of time to write out posts detailing those updates. Instead, I have been including change logs in each of the project’s Git repository.
Another bit of Stats-related news is that version 1 of the Stats API will be turned off at the end of this month, January 31, 2023. If you are currently using Stats API v1 and want to use Stats API v2, I would highly recommend that you review the “Wait Wait Stats API v2.0 Changes” document for information on migrating to the new API version. After January 31, 2023, calls made to the Stats API v1 endpoints will return an HTTP status code of 404.
I also made the decision to deactivate my Twitter account and have set up my own presence on out in the fediverse by running my own Mastodon instance. If you are part of the fediverse, you can follow me at @[email protected]. I have set up a bot account on my instance, @[email protected], that regularly scrapes the Wait Wait… Don’t Tell Me! podcast feed and publishes a new post when a new episode appears.
One last bit of news, I will be publishing new posts and updates on my new blog at blog.linh.social. I will be keeping this site up and running for the foreseeable future.
One feature for the Stats Page that had been floating around in my mind since I started developing Version 4 was a way to get a random show that I could just queue up and listen. I finally had a couple of hours to build out the logic to add not only a randomizer for shows, but also for guests, hosts, locations and panelists.
What made easy to implement across the board was being able to use MySQL/MariaDB’s RAND() function in queries. Otherwise, I would have needed to pull down a list of valid IDs for each respective table and use Python’s RNG to pick one at random. While that result in better randomization over the long run, but it would add to the processing time. I wanted it to be as quick and efficient as possible from the application point of view. The RAND() function in MySQL/MariaDB may not scale all too well for tables that have record counts in the tens of thousands. Of the tables in scope, the table with the most number of records is still in the low thousands.
From the front-end perspective, I added a “Random Show” icon in the navigation bar and item in the slide-out navigation menu for devices with smaller screens. There is also a link to load a random show from the main “Shows” page. The randomizer will only include shows that have a show date that is less than or equal to the current date. This should prevent future shows from being selected.
To get a random guest, host, location, panelist or scorekeeper, there is a link on their respective main section page.
Random Host Link on Hosts Page
Random Guest Link on Guests Page
Random Location Link on Locations Page
Random Panelist Link on Panelists Page
Random Scorekeeper Link on Scorekeepers Page
I am looking at adding the feature to get a random show from a specific year and possibly adding randomizer endpoints to the Stats API.
In addition to the new randomizer feature, the Stats Page now has rudimentary 404 handling and some of the dependent packages have been updated to keep up with security updates.
As always, the source code for the Stats Page is available on GitHub and is published under the terms of Apache License 2.0.
2020 has been a very challenging year for everyone and it has required things to adapt and change as the ongoing human malware event continues to have an impact on us. I have been fortunately enough to continue working from home and have been able to further improve and expand on the Stats Page, Reports and Graphs sites. Some of the changes are behind the scenes updates and others are more visible.
Wait Wait in 2020
Due to the requirements of social distancing and limiting travel, Wait Wait… Don’t Tell Me! canceled live tapings of their show and went back to their roots of recording the show from remote (home) studios. Before the human malware event, I had actually planned to travel to Chicago to attend a live taping on my 40th birthday in April. Part of the plan also included giving me the opportunity to request the panel of my choice.
Even though that trip was canceled, the team at Wait Wait were able to get the panelists that I had chosen (Faith Salie, Luke Burbank and Mo Rocca) and there were two nice mentions during the show that aired on April 25th.
Stats Page and API Updates
I may not have gone to Chicago in April, but I did still take time off from work, used some of that time to listen to Wait Wait shows before 2006 and collect Bluff the Listener, Not My Job and Lightning Fill-in-the-Blank data to fill in any gaps that I had in the Stats Page database. The Bluff the Listener data that I collected would useful for a new feature that I’ll get to later.
Through the rest of the year, I made a couple of minor changes to the data access library that the Stats Page and the Stats API make use of to pull data from the database. Some of those include bug fixes due to missing date to string conversions and data formatting to make things more consistent. The Stats API also got a fix to correct a few mistakes that I made when copying code to handle requests.
The Stats Page got a couple of updates to the CSS to correct issues that the WAVE Web Accessibility Tool brought up when I started more thorough accessibility testing of the Stats Page and Graphs Site. Another accessibility change was to make the page breadcrumbs more usable on devices with smaller, touch-driven screens.
I’ve also added a couple of new features including:
On This Day page that lists shows that aired in the past on the current month and day.
Locations section that provides a list of show locations and changing the location text on each show to link to the corresponding location page.
Adding (where data is available) Lightning Fill-in-the-Blank start and correct values for each show on the panelist pages. This also included changing how the panelists’ ranks are listed.
Updated Show Details
Show Location Details
Updated Panelist Details
Over the past year, I have added a number of new reports to the Reports Site and restructured the landing page and created new section pages to improve site navigation. The new reports touch on statistics surrounding Bluff the Listener data, panelist and Not My Job guest appearances, and Lightning Fill-in-the-Blank scores.
Not to be left out, the Graphs Site also got a couple of updates over the course of the year, both in terms of behind the scene code updates and in features. As with the Stats Page, the Graphs Site did get some accessibility updates that include updating the page breadcrumbs styling and CSS updates to resolve issues that the WAVE Web Accessibility Tool brought up.
Another update to the Graphs Site was to replace the Chart.js charting library, which had been updated throughout the year, with Plotly.js. I had originally chosen Chart.js for its simplicity and minimal code footprint; and, it has served its purpose quite well.
Scores by Appearances Graph using Plotly.js
Bluff the Listener Counts Graph
Bluff the Listener Counts for All Years Graph
The switch to Plotly also brings some performance improvements and adds the ability to select a portion of the graph to zoom into and pan along the x-axis for a more detailed look at the data. Plotly also provides the additional configuration options on exporting the chart, including customizing the export file name and the image dimensions when configuring each chart.
A fellow Wait Wait fan wanted to see how well the listener contestants have been doing at choosing the correct Bluff the Listener story over the years and would like to see that in plotted out. So, I went to work on creating a new set of graphs that plotted the (now more completed) Bluff data, grouped by month and another graph showing the data across all years.
Although I haven’t formalized a full list of new features that I have planned for the sites in 2021, there are a couple of things that I plan on experimenting on now that I have switched the Graphs Site to use Plotly. There will always be new reports to be created based on requests for data or statistics that I get on a regular basis.
There is still a sliver of hope that audio from the lost years, 1998 and 1999, will be made available as NPR continues their digitization projects. That would provide a chance to fill in my data blanks for those years.
Here’s to hoping that 2021 will be a better year than 2020 has been.
The source code for the Stats Page, Reports Site, Graphs Site and the Stats API are all published under the open source Apache License, Version 2.0 and are available on GitHub at github.com/questionlp.
I also dug up the code for versions 2 and 3 of the Stats Page and was able to get them working on more modern versions of Ubuntu and PHP. It did require a some changes to the code to bring them up to PHP 7 standards. While neither version are available publicly online, I have them running on virtual machines at home so that I can compare and reference them as needed. The updated source code for both are also released under the Apache License, Version 2.0 and are also available on GitHub.
Due to the ongoing pandemic scuppering my plans to visit Chicago and go to a Wait Wait… Don’t Tell Me! live taping on my 40th birthday, I decided to take some time off of work and do some catch-up work on the Wait Wait Stats Page side of things. So, over the past few days, I have been collecting and entering in the missing data for theBluff the Listener segment data (correct and chosen panelists), Not My Job scores, and some of the Lightning Fill-in-the-Blank segment starting score and correct answers.
Up until recently, I was missing a varying amount of data for 2001 through 2012. For the most part, I scanned through each of the segment audio or full show audio, that I’ve been able to coax off of NPR’s public servers, and record the data as I went along. Something that I noted is that the segment order that NPR lists for some of the shows in the early 2000s does not match up with actual audio segment order.
I was able to use NPR’s transcriptions for the Bluff the Listener segments as a source for the shows in the 2010s. There is one show in 2001 in which I have not been able to find any audio files for and that’s the December 8th show. NPR has files for all other shows from that year except for that one. #sigh
Back in 2018, I made the decision to step away from the codebase for version 3.x of the Stats Page; which, was written in PHP and made some use of the Silex micro-framework. While it would have been feasible to continue building on top of that codebase, it would have required a significant re-write in order to meet the requirements that I had for the new version:
Back in March 2019, I was working with Colin Miller at Wait Wait… Don’t Tell Me! on a request to pull data regarding the various Wait Wait shows that have aired; especially, getting data on shows that excluded Best Of or Repeat shows. The purpose of the request would later be revealed to be trying to determine when the 1000th* original show would land on the schedule. In July, the date and the location for the taping of the 1000th* show would start to solidify. The show would be taped in Salt Lake City, Utah and on Thursday, October 24th.
Why Salt Lake City? The very first time that Wait Wait was taped in front of a live audience was at Jewett Center at Westminster College in Salt Lake City back in January 2000. So, it would be apropos that Wait Wait would also tape their 1000th* show in Salt Lake City; but, this time, they would tape two shows at the Delta Performance Hall at Eccles Theatre in Downtown Salt Lake. I had the privilege to attend both tapings and attend an after show party at Impact Hub.
As with almost every taping that I am able to attend, Peter Sagal would shine some of the spotlight on me at the end of the audience Q&A segment after introducing everybody who works behind the scenes of each show. There were a lot of people at the party that asked me about the Stats Page and the history behind it. Let’s just say that I was a bit exhausted by the end of the night and needed a little bit of recovery time.
All that said, I absolutely enjoyed both shows and getting to meet and talk with the panelists, Peter and everyone who worked to make the shows happen each and every week.
Something that I had been working on since learning about the details of the taping of the 1000th* show has been a number of reports that I had put together in celebration of the milestone. One of those were done posted as a tweet thread on Twitter detailing the panelists with the longest win streaks in show history. Another report that I created in Python was the number of appearances each panelist has made on the show broken out by year.
Panelist Appearance Report
That lead me to start building up a new site that would host these kinds of reports and allow me to quickly add new ones as the need arises; or, even by request. So, I created a new project using Python, Flask and Jinja, tools that I have been using to build the Wait Wait Stats API, data queries for reports that I have been creating. This would also give me another chance to get more familiar with Flask and Jinja as I get closer to building out a brand new version of the Stats Page.
In under two weeks, I was able to stand up the new project, port over existing reports, adding a few new ones and publishing it. The new report site is available at reports.wwdt.me; and, unlike the two existing reports, the site renders all of the reports live. The two older reports would be generated on a schedule as they needed to be served through the same site as the current Stats Page. On the other hand, the new report site runs on a new server using NGINX to serve the Flask app through uWSGI and static files directly. This is a near-identical set up that I used for the Stats API and will be using for the new Stats Page.
*: You’ve been wondering why I have added an asterisk to each instance of “1000th” that I mentioned in this post. While the original calculations that were made did in fact show that the show that would be taped on October 24th and aired the weekend of October 26th should be the 1000th show; but, there seems to have been one additional Best Of show that was snuck into the schedule that wasn’t accounted for or was a late decision. You will notice that the show ends up being listed as #999 on the “Original Shows” report.