Python Friday #187: Extracting the NDC Talks Data From YouTube

It is time for a hands-on walkthrough to collect data and turn it into useful plots. For this exercise we are going to collect metadata from YouTube and run it through Seaborn to figure out if my hunch is correct or not.

This post is part of my journey to learn Python. You find the code for this post in my PythonFriday repository on GitHub.

Thesis: The talks at NDC Oslo 2023 where shorter than usual

I was not happy with the length of the talks I attended at this year’s NDC Oslo conference. Whenever I checked their YouTube channel, I saw more and more talks that where significantly short of their 60 minutes slot. Am I just ignoring all evidence that would prove my gut feeling wrong and the confirmation bias takes the better of me, or can I prove my point with data?

If I am right, I expect to find these 3 measurable results for talks at NDC Oslo 2023 compared with the previous years:

The average duration is lower.
There are more talks below 50 minutes.
The median duration is lower.

YouTube as our data source

NDC Conferences publish the talks of their conferences in their YouTube channel. While there is a bit of advertising before and after the recorded talks, this overhead is neglectable, and we can use the video length as a proxy measurement for the talk duration.

The YouTube channel of NDC has currently 3301 videos of talks and promo videos for their conferences.

On the channel we not only find talks, but also promo videos for their conferences and workshops. With currently 3301 videos, we should find enough data points to (dis-)prove the thesis.

How to extract the metadata from YouTube?

There are 3 main approaches we can use for scraping YouTube:

We use the official API and iterate through the videos in the channel.
We automate a browser (with Selenium or Playwright), toggle through the endless-scroll and extract the data with Beautiful Soup
We use a dedicated scraping tool for YouTube.

The API seams not to offer the metadata I am interested in, and the browser automation was already the topic of many posts in the past. Therefore, it is time for option 3.

Install Scrapetube

We can install Scrapetube with this command:

pip install scrapetube

1	pip install scrapetube

To run the examples, we need the Id of the NDC channel. We find the Id on the About page where we can copy it on the share icon:

Click on the share icon and then on Copy channel ID.

This should give us an Id like this one:

UCTdw38Cw6jcm0atBPA39a0Q

To check if everything is in place, we can run this example code with the Id we extracted:

import scrapetube

videos = scrapetube.get_channel("UCTdw38Cw6jcm0atBPA39a0Q")

for video in videos:
    print(video['videoId'])

import scrapetube

videos = scrapetube.get_channel("UCTdw38Cw6jcm0atBPA39a0Q")

for video in videos:

print(video['videoId'])

This should give us a list of video Ids:

tKnNqftbT4Q
1DuxTlvmaNM
AYU0vw6IyUY
1K1LRzXeO_4
iDWwoz9ZUzw
I2NqMbKc8K4
cae0jXRrNno
vXLz_U4xzkc
Nyj2O8gHJZM

Fight with Scrapetube

While Scrapetube has a small API, an example on how to access the return values is nowhere to be found. We can overcome this obstacle with this little helper code to extract the keys and values of the result set:

videos = scrapetube.get_channel("UCTdw38Cw6jcm0atBPA39a0Q")
for video in videos:
    for key, value in video.items():
        print(f"{key}: {value}")

videos = scrapetube.get_channel("UCTdw38Cw6jcm0atBPA39a0Q")

for video in videos:

for key, value in video.items():

print(f"{key}: {value}")

Start it and then kill the program when you have a few results like these ones:

videoId: 1DuxTlvmaNM
thumbnail: {'thumbnails': [{'url': 'https://i.ytimg.com/vi/1DuxTlvmaNM/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLDFygXFvyqgCo3zDNEkjqejIaVRhA', 'width': 168, 'height': 94}, {'url': 'https://i.ytimg.com/vi/1DuxTlvmaNM/hqdefault.jpg?sqp=-oaymwEbCMQBEG5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLCK6T9WZDQY9Zi1XE9umML2TkXXuw', 'width': 196, 'height': 110}, {'url': 'https://i.ytimg.com/vi/1DuxTlvmaNM/hqdefault.jpg?sqp=-oaymwEcCPYBEIoBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLABQ_abyMnk0SM2K_4AyEwQ57zPyg', 'width': 246, 'height': 138}, {'url': 'https://i.ytimg.com/vi/1DuxTlvmaNM/hqdefault.jpg?sqp=-oaymwEcCNACELwBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLCBR0RZumYJa6w6Lb_fL_DLiTcBOg', 'width': 336, 'height': 188}]}
title: {'runs': [{'text': 'Managing Kubernetes the GitOps way with Flux - Jeff French - NDC Oslo 2023'}], 'accessibility': {'accessibilityData': {'label': 'Managing Kubernetes the GitOps way with Flux - Jeff French - NDC Oslo 2023 by NDC Conferences 4 days ago 1 hour 1,003 views'}}}
descriptionSnippet: {'runs': [{'text': 'In this session, we will explore the benefits and best practices of using the Flux GitOps framework for managing Kubernetes clusters. We will start with a brief overview of GitOps and the Flux...'}]}
publishedTimeText: {'simpleText': '4 days ago'}
lengthText: {'accessibility': {'accessibilityData': {'label': '1 hour, 38 seconds'}}, 'simpleText': '1:00:38'}
viewCountText: {'simpleText': '1,003 views'}
navigationEndpoint: {'clickTrackingParams': 'CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZWhhVQ1RkdzM4Q3c2amNtMGF0QlBBMzlhMFGaAQYQ8jgY4Ac=', 'commandMetadata': {'webCommandMetadata': {'url': '/watch?v=1DuxTlvmaNM', 'webPageType': 'WEB_PAGE_TYPE_WATCH', 'rootVe': 3832}}, 'watchEndpoint': {'videoId': '1DuxTlvmaNM', 'watchEndpointSupportedOnesieConfig': {'html5PlaybackOnesieConfig': {'commonConfig': {'url': 'https://rr3---sn-1gi7znes.googlevideo.com/initplayback?source=youtube&oeis=1&c=WEB&oad=3200&ovd=3200&oaad=11000&oavd=11000&ocs=700&oewis=1&oputc=1&ofpcc=1&msp=1&odepv=1&id=d43bb14e5be668d3&ip=213.142.178.228&initcwndbps=1770000&mt=1689624085&oweuc=&pxtags=Cg4KAnR4Egg0NTE3MDA1OA&rxtags=Cg4KAnR4Egg0NTE3MDA1OA%2CCg4KAnR4Egg0NTE3MDA1OQ'}}}}}
trackingParams: CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZQNPRmd_lqeyd1AE=
showActionMenu: False
shortViewCountText: {'accessibility': {'accessibilityData': {'label': '1K views'}}, 'simpleText': '1K views'}
menu: {'menuRenderer': {'items': [{'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Add to queue'}]}, 'icon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'serviceEndpoint': {'clickTrackingParams': 'CPIBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'CPIBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': '1DuxTlvmaNM', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'CPIBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['1DuxTlvmaNM'], 'params': 'CAQ%3D'}}, 'videoIds': ['1DuxTlvmaNM']}}]}}, 'trackingParams': 'CPIBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'menuServiceItemDownloadRenderer': {'serviceEndpoint': {'clickTrackingParams': 'CPEBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'offlineVideoEndpoint': {'videoId': '1DuxTlvmaNM', 'onAddCommand': {'clickTrackingParams': 'CPEBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'getDownloadActionCommand': {'videoId': '1DuxTlvmaNM', 'params': 'CAI%3D'}}}}, 'trackingParams': 'CPEBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Share'}]}, 'icon': {'iconType': 'SHARE'}, 'serviceEndpoint': {'clickTrackingParams': 'CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/share/get_share_panel'}}, 'shareEntityServiceEndpoint': {'serializedShareEntity': 'CgsxRHV4VGx2bWFOTQ%3D%3D', 'commands': [{'clickTrackingParams': 'CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'openPopupAction': {'popup': {'unifiedSharePanelRenderer': {'trackingParams': 'CPABEI5iIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'showLoadingSpinner': True}}, 'popupType': 'DIALOG', 'beReused': True}}]}}, 'trackingParams': 'CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}], 'trackingParams': 'CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'accessibility': {'accessibilityData': {'label': 'Action menu'}}}}
thumbnailOverlays: [{'thumbnailOverlayTimeStatusRenderer': {'text': {'accessibility': {'accessibilityData': {'label': '1 hour, 38 seconds'}}, 'simpleText': '1:00:38'}, 'style': 'DEFAULT'}}, {'thumbnailOverlayToggleButtonRenderer': {'isToggled': False, 'untoggledIcon': {'iconType': 'WATCH_LATER'}, 'toggledIcon': {'iconType': 'CHECK'}, 'untoggledTooltip': 'Watch later', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'CO8BEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'addedVideoId': '1DuxTlvmaNM', 'action': 'ACTION_ADD_VIDEO'}]}}, 'toggledServiceEndpoint': {'clickTrackingParams': 'CO8BEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'action': 'ACTION_REMOVE_VIDEO_BY_VIDEO_ID', 'removedVideoId': '1DuxTlvmaNM'}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Watch later'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'CO8BEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'thumbnailOverlayToggleButtonRenderer': {'untoggledIcon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'toggledIcon': {'iconType': 'PLAYLIST_ADD_CHECK'}, 'untoggledTooltip': 'Add to queue', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'CO4BEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'CO4BEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': '1DuxTlvmaNM', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'CO4BEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['1DuxTlvmaNM'], 'params': 'CAQ%3D'}}, 'videoIds': ['1DuxTlvmaNM']}}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Add to queue'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'CO4BEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'thumbnailOverlayNowPlayingRenderer': {'text': {'runs': [{'text': 'Now playing'}]}}}]

videoId: AYU0vw6IyUY
thumbnail: {'thumbnails': [{'url': 'https://i.ytimg.com/vi/AYU0vw6IyUY/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLCU_3rIVvaQzXp6F0Ef7fcvIZx5xw', 'width': 168, 'height': 94}, {'url': 'https://i.ytimg.com/vi/AYU0vw6IyUY/hqdefault.jpg?sqp=-oaymwEbCMQBEG5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLA6Q-O5wWBod6W3PXE8JMS1484hcQ', 'width': 196, 'height': 110}, {'url': 'https://i.ytimg.com/vi/AYU0vw6IyUY/hqdefault.jpg?sqp=-oaymwEcCPYBEIoBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLCD08KENYaqx17vmC0K3W2ks2cdHQ', 'width': 246, 'height': 138}, {'url': 'https://i.ytimg.com/vi/AYU0vw6IyUY/hqdefault.jpg?sqp=-oaymwEcCNACELwBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLDMFCdIroiTRm6rjV9bg8Q3kEBcAA', 'width': 336, 'height': 188}]}
title: {'runs': [{'text': 'Make a great-looking 3D landscape visualization! - Kristoffer Dyrkorn - NDC Oslo 2023'}], 'accessibility': {'accessibilityData': {'label': 'Make a great-looking 3D landscape visualization! - Kristoffer Dyrkorn - NDC Oslo 2023 by NDC Conferences 4 days ago 59 minutes 1,065 views'}}}
descriptionSnippet: {'runs': [{'text': 'In this session you will learn all you need to create your own 3D landscape visualizations in the browser. I will go through:\n\n- where to get free, high quality, data\n- how to preprocess and...'}]}
publishedTimeText: {'simpleText': '4 days ago'}
lengthText: {'accessibility': {'accessibilityData': {'label': '59 minutes, 6 seconds'}}, 'simpleText': '59:06'}
viewCountText: {'simpleText': '1,065 views'}
navigationEndpoint: {'clickTrackingParams': 'COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZWhhVQ1RkdzM4Q3c2amNtMGF0QlBBMzlhMFGaAQYQ8jgY4Ac=', 'commandMetadata': {'webCommandMetadata': {'url': '/watch?v=AYU0vw6IyUY', 'webPageType': 'WEB_PAGE_TYPE_WATCH', 'rootVe': 3832}}, 'watchEndpoint': {'videoId': 'AYU0vw6IyUY', 'watchEndpointSupportedOnesieConfig': {'html5PlaybackOnesieConfig': {'commonConfig': {'url': 'https://rr1---sn-1gi7znek.googlevideo.com/initplayback?source=youtube&oeis=1&c=WEB&oad=3200&ovd=3200&oaad=11000&oavd=11000&ocs=700&oewis=1&oputc=1&ofpcc=1&msp=1&odepv=1&id=018534bf0e88c946&ip=213.142.178.228&initcwndbps=1770000&mt=1689624085&oweuc=&pxtags=Cg4KAnR4Egg0NTE3MDA1OA&rxtags=Cg4KAnR4Egg0NTE3MDA1OA%2CCg4KAnR4Egg0NTE3MDA1OQ'}}}}}
trackingParams: COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZQMaSo_Twl83CAQ==
showActionMenu: False
shortViewCountText: {'accessibility': {'accessibilityData': {'label': '1K views'}}, 'simpleText': '1K views'}
menu: {'menuRenderer': {'items': [{'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Add to queue'}]}, 'icon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'serviceEndpoint': {'clickTrackingParams': 'COsBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'COsBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': 'AYU0vw6IyUY', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'COsBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['AYU0vw6IyUY'], 'params': 'CAQ%3D'}}, 'videoIds': ['AYU0vw6IyUY']}}]}}, 'trackingParams': 'COsBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'menuServiceItemDownloadRenderer': {'serviceEndpoint': {'clickTrackingParams': 'COoBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'offlineVideoEndpoint': {'videoId': 'AYU0vw6IyUY', 'onAddCommand': {'clickTrackingParams': 'COoBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'getDownloadActionCommand': {'videoId': 'AYU0vw6IyUY', 'params': 'CAI%3D'}}}}, 'trackingParams': 'COoBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Share'}]}, 'icon': {'iconType': 'SHARE'}, 'serviceEndpoint': {'clickTrackingParams': 'COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/share/get_share_panel'}}, 'shareEntityServiceEndpoint': {'serializedShareEntity': 'CgtBWVUwdnc2SXlVWQ%3D%3D', 'commands': [{'clickTrackingParams': 'COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'openPopupAction': {'popup': {'unifiedSharePanelRenderer': {'trackingParams': 'COkBEI5iIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'showLoadingSpinner': True}}, 'popupType': 'DIALOG', 'beReused': True}}]}}, 'trackingParams': 'COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}], 'trackingParams': 'COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'accessibility': {'accessibilityData': {'label': 'Action menu'}}}}
thumbnailOverlays: [{'thumbnailOverlayTimeStatusRenderer': {'text': {'accessibility': {'accessibilityData': {'label': '59 minutes, 6 seconds'}}, 'simpleText': '59:06'}, 'style': 'DEFAULT'}}, {'thumbnailOverlayToggleButtonRenderer': {'isToggled': False, 'untoggledIcon': {'iconType': 'WATCH_LATER'}, 'toggledIcon': {'iconType': 'CHECK'}, 'untoggledTooltip': 'Watch later', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'COgBEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'addedVideoId': 'AYU0vw6IyUY', 'action': 'ACTION_ADD_VIDEO'}]}}, 'toggledServiceEndpoint': {'clickTrackingParams': 'COgBEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'action': 'ACTION_REMOVE_VIDEO_BY_VIDEO_ID', 'removedVideoId': 'AYU0vw6IyUY'}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Watch later'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'COgBEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'thumbnailOverlayToggleButtonRenderer': {'untoggledIcon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'toggledIcon': {'iconType': 'PLAYLIST_ADD_CHECK'}, 'untoggledTooltip': 'Add to queue', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'COcBEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'COcBEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': 'AYU0vw6IyUY', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'COcBEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['AYU0vw6IyUY'], 'params': 'CAQ%3D'}}, 'videoIds': ['AYU0vw6IyUY']}}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Add to queue'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'COcBEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'thumbnailOverlayNowPlayingRenderer': {'text': {'runs': [{'text': 'Now playing'}]}}}]

videoId: 1DuxTlvmaNM

thumbnail: {'thumbnails': [{'url': 'https://i.ytimg.com/vi/1DuxTlvmaNM/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLDFygXFvyqgCo3zDNEkjqejIaVRhA', 'width': 168, 'height': 94}, {'url': 'https://i.ytimg.com/vi/1DuxTlvmaNM/hqdefault.jpg?sqp=-oaymwEbCMQBEG5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLCK6T9WZDQY9Zi1XE9umML2TkXXuw', 'width': 196, 'height': 110}, {'url': 'https://i.ytimg.com/vi/1DuxTlvmaNM/hqdefault.jpg?sqp=-oaymwEcCPYBEIoBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLABQ_abyMnk0SM2K_4AyEwQ57zPyg', 'width': 246, 'height': 138}, {'url': 'https://i.ytimg.com/vi/1DuxTlvmaNM/hqdefault.jpg?sqp=-oaymwEcCNACELwBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLCBR0RZumYJa6w6Lb_fL_DLiTcBOg', 'width': 336, 'height': 188}]}

title: {'runs': [{'text': 'Managing Kubernetes the GitOps way with Flux - Jeff French - NDC Oslo 2023'}], 'accessibility': {'accessibilityData': {'label': 'Managing Kubernetes the GitOps way with Flux - Jeff French - NDC Oslo 2023 by NDC Conferences 4 days ago 1 hour 1,003 views'}}}

descriptionSnippet: {'runs': [{'text': 'In this session, we will explore the benefits and best practices of using the Flux GitOps framework for managing Kubernetes clusters. We will start with a brief overview of GitOps and the Flux...'}]}

publishedTimeText: {'simpleText': '4 days ago'}

lengthText: {'accessibility': {'accessibilityData': {'label': '1 hour, 38 seconds'}}, 'simpleText': '1:00:38'}

viewCountText: {'simpleText': '1,003 views'}

navigationEndpoint: {'clickTrackingParams': 'CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZWhhVQ1RkdzM4Q3c2amNtMGF0QlBBMzlhMFGaAQYQ8jgY4Ac=', 'commandMetadata': {'webCommandMetadata': {'url': '/watch?v=1DuxTlvmaNM', 'webPageType': 'WEB_PAGE_TYPE_WATCH', 'rootVe': 3832}}, 'watchEndpoint': {'videoId': '1DuxTlvmaNM', 'watchEndpointSupportedOnesieConfig': {'html5PlaybackOnesieConfig': {'commonConfig': {'url': 'https://rr3---sn-1gi7znes.googlevideo.com/initplayback?source=youtube&oeis=1&c=WEB&oad=3200&ovd=3200&oaad=11000&oavd=11000&ocs=700&oewis=1&oputc=1&ofpcc=1&msp=1&odepv=1&id=d43bb14e5be668d3&ip=213.142.178.228&initcwndbps=1770000&mt=1689624085&oweuc=&pxtags=Cg4KAnR4Egg0NTE3MDA1OA&rxtags=Cg4KAnR4Egg0NTE3MDA1OA%2CCg4KAnR4Egg0NTE3MDA1OQ'}}}}}

trackingParams: CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZQNPRmd_lqeyd1AE=

showActionMenu: False

shortViewCountText: {'accessibility': {'accessibilityData': {'label': '1K views'}}, 'simpleText': '1K views'}

menu: {'menuRenderer': {'items': [{'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Add to queue'}]}, 'icon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'serviceEndpoint': {'clickTrackingParams': 'CPIBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'CPIBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': '1DuxTlvmaNM', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'CPIBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['1DuxTlvmaNM'], 'params': 'CAQ%3D'}}, 'videoIds': ['1DuxTlvmaNM']}}]}}, 'trackingParams': 'CPIBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'menuServiceItemDownloadRenderer': {'serviceEndpoint': {'clickTrackingParams': 'CPEBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'offlineVideoEndpoint': {'videoId': '1DuxTlvmaNM', 'onAddCommand': {'clickTrackingParams': 'CPEBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'getDownloadActionCommand': {'videoId': '1DuxTlvmaNM', 'params': 'CAI%3D'}}}}, 'trackingParams': 'CPEBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Share'}]}, 'icon': {'iconType': 'SHARE'}, 'serviceEndpoint': {'clickTrackingParams': 'CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/share/get_share_panel'}}, 'shareEntityServiceEndpoint': {'serializedShareEntity': 'CgsxRHV4VGx2bWFOTQ%3D%3D', 'commands': [{'clickTrackingParams': 'CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'openPopupAction': {'popup': {'unifiedSharePanelRenderer': {'trackingParams': 'CPABEI5iIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'showLoadingSpinner': True}}, 'popupType': 'DIALOG', 'beReused': True}}]}}, 'trackingParams': 'CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}], 'trackingParams': 'CO0BENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'accessibility': {'accessibilityData': {'label': 'Action menu'}}}}

thumbnailOverlays: [{'thumbnailOverlayTimeStatusRenderer': {'text': {'accessibility': {'accessibilityData': {'label': '1 hour, 38 seconds'}}, 'simpleText': '1:00:38'}, 'style': 'DEFAULT'}}, {'thumbnailOverlayToggleButtonRenderer': {'isToggled': False, 'untoggledIcon': {'iconType': 'WATCH_LATER'}, 'toggledIcon': {'iconType': 'CHECK'}, 'untoggledTooltip': 'Watch later', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'CO8BEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'addedVideoId': '1DuxTlvmaNM', 'action': 'ACTION_ADD_VIDEO'}]}}, 'toggledServiceEndpoint': {'clickTrackingParams': 'CO8BEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'action': 'ACTION_REMOVE_VIDEO_BY_VIDEO_ID', 'removedVideoId': '1DuxTlvmaNM'}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Watch later'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'CO8BEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'thumbnailOverlayToggleButtonRenderer': {'untoggledIcon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'toggledIcon': {'iconType': 'PLAYLIST_ADD_CHECK'}, 'untoggledTooltip': 'Add to queue', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'CO4BEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'CO4BEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': '1DuxTlvmaNM', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'CO4BEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['1DuxTlvmaNM'], 'params': 'CAQ%3D'}}, 'videoIds': ['1DuxTlvmaNM']}}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Add to queue'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'CO4BEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'thumbnailOverlayNowPlayingRenderer': {'text': {'runs': [{'text': 'Now playing'}]}}}]

videoId: AYU0vw6IyUY

thumbnail: {'thumbnails': [{'url': 'https://i.ytimg.com/vi/AYU0vw6IyUY/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLCU_3rIVvaQzXp6F0Ef7fcvIZx5xw', 'width': 168, 'height': 94}, {'url': 'https://i.ytimg.com/vi/AYU0vw6IyUY/hqdefault.jpg?sqp=-oaymwEbCMQBEG5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLA6Q-O5wWBod6W3PXE8JMS1484hcQ', 'width': 196, 'height': 110}, {'url': 'https://i.ytimg.com/vi/AYU0vw6IyUY/hqdefault.jpg?sqp=-oaymwEcCPYBEIoBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLCD08KENYaqx17vmC0K3W2ks2cdHQ', 'width': 246, 'height': 138}, {'url': 'https://i.ytimg.com/vi/AYU0vw6IyUY/hqdefault.jpg?sqp=-oaymwEcCNACELwBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLDMFCdIroiTRm6rjV9bg8Q3kEBcAA', 'width': 336, 'height': 188}]}

title: {'runs': [{'text': 'Make a great-looking 3D landscape visualization! - Kristoffer Dyrkorn - NDC Oslo 2023'}], 'accessibility': {'accessibilityData': {'label': 'Make a great-looking 3D landscape visualization! - Kristoffer Dyrkorn - NDC Oslo 2023 by NDC Conferences 4 days ago 59 minutes 1,065 views'}}}

descriptionSnippet: {'runs': [{'text': 'In this session you will learn all you need to create your own 3D landscape visualizations in the browser. I will go through:\n\n- where to get free, high quality, data\n- how to preprocess and...'}]}

publishedTimeText: {'simpleText': '4 days ago'}

lengthText: {'accessibility': {'accessibilityData': {'label': '59 minutes, 6 seconds'}}, 'simpleText': '59:06'}

viewCountText: {'simpleText': '1,065 views'}

navigationEndpoint: {'clickTrackingParams': 'COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZWhhVQ1RkdzM4Q3c2amNtMGF0QlBBMzlhMFGaAQYQ8jgY4Ac=', 'commandMetadata': {'webCommandMetadata': {'url': '/watch?v=AYU0vw6IyUY', 'webPageType': 'WEB_PAGE_TYPE_WATCH', 'rootVe': 3832}}, 'watchEndpoint': {'videoId': 'AYU0vw6IyUY', 'watchEndpointSupportedOnesieConfig': {'html5PlaybackOnesieConfig': {'commonConfig': {'url': 'https://rr1---sn-1gi7znek.googlevideo.com/initplayback?source=youtube&oeis=1&c=WEB&oad=3200&ovd=3200&oaad=11000&oavd=11000&ocs=700&oewis=1&oputc=1&ofpcc=1&msp=1&odepv=1&id=018534bf0e88c946&ip=213.142.178.228&initcwndbps=1770000&mt=1689624085&oweuc=&pxtags=Cg4KAnR4Egg0NTE3MDA1OA&rxtags=Cg4KAnR4Egg0NTE3MDA1OA%2CCg4KAnR4Egg0NTE3MDA1OQ'}}}}}

trackingParams: COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZQMaSo_Twl83CAQ==

showActionMenu: False

shortViewCountText: {'accessibility': {'accessibilityData': {'label': '1K views'}}, 'simpleText': '1K views'}

menu: {'menuRenderer': {'items': [{'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Add to queue'}]}, 'icon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'serviceEndpoint': {'clickTrackingParams': 'COsBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'COsBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': 'AYU0vw6IyUY', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'COsBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['AYU0vw6IyUY'], 'params': 'CAQ%3D'}}, 'videoIds': ['AYU0vw6IyUY']}}]}}, 'trackingParams': 'COsBEP6YBBgFIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'menuServiceItemDownloadRenderer': {'serviceEndpoint': {'clickTrackingParams': 'COoBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'offlineVideoEndpoint': {'videoId': 'AYU0vw6IyUY', 'onAddCommand': {'clickTrackingParams': 'COoBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'getDownloadActionCommand': {'videoId': 'AYU0vw6IyUY', 'params': 'CAI%3D'}}}}, 'trackingParams': 'COoBENGqBRgGIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Share'}]}, 'icon': {'iconType': 'SHARE'}, 'serviceEndpoint': {'clickTrackingParams': 'COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/share/get_share_panel'}}, 'shareEntityServiceEndpoint': {'serializedShareEntity': 'CgtBWVUwdnc2SXlVWQ%3D%3D', 'commands': [{'clickTrackingParams': 'COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'openPopupAction': {'popup': {'unifiedSharePanelRenderer': {'trackingParams': 'COkBEI5iIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'showLoadingSpinner': True}}, 'popupType': 'DIALOG', 'beReused': True}}]}}, 'trackingParams': 'COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}], 'trackingParams': 'COYBENwwIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'accessibility': {'accessibilityData': {'label': 'Action menu'}}}}

thumbnailOverlays: [{'thumbnailOverlayTimeStatusRenderer': {'text': {'accessibility': {'accessibilityData': {'label': '59 minutes, 6 seconds'}}, 'simpleText': '59:06'}, 'style': 'DEFAULT'}}, {'thumbnailOverlayToggleButtonRenderer': {'isToggled': False, 'untoggledIcon': {'iconType': 'WATCH_LATER'}, 'toggledIcon': {'iconType': 'CHECK'}, 'untoggledTooltip': 'Watch later', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'COgBEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'addedVideoId': 'AYU0vw6IyUY', 'action': 'ACTION_ADD_VIDEO'}]}}, 'toggledServiceEndpoint': {'clickTrackingParams': 'COgBEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'action': 'ACTION_REMOVE_VIDEO_BY_VIDEO_ID', 'removedVideoId': 'AYU0vw6IyUY'}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Watch later'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'COgBEPnnAxgBIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'thumbnailOverlayToggleButtonRenderer': {'untoggledIcon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'toggledIcon': {'iconType': 'PLAYLIST_ADD_CHECK'}, 'untoggledTooltip': 'Add to queue', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'COcBEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'COcBEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': 'AYU0vw6IyUY', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'COcBEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['AYU0vw6IyUY'], 'params': 'CAQ%3D'}}, 'videoIds': ['AYU0vw6IyUY']}}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Add to queue'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'COcBEMfsBBgCIhMIp_vlyoyZgAMVbswRCB3VdQcZ'}}, {'thumbnailOverlayNowPlayingRenderer': {'text': {'runs': [{'text': 'Now playing'}]}}}]

From this output we can work with dictionary- and list accessors to extract the data in which we are interested.

The YouTube data extractor

After a few rounds of experimenting with the return values of Scrapetube, I ended up with this extractor tool.

First, we need to import the libraries we use:

import scrapetube
import datetime as dt
import pandas as pd

import scrapetube

import datetime as dt

import pandas as pd

The duration of the talk is in the format of “hours:minutes:seconds“, what will be a pain to work with in Pandas. Therefore, we convert the duration into minutes and round the seconds up or down to a full minute:

def duration_in_minutes(duration):
    duration_parts = duration.count(':')
    if(duration_parts == 1):
        temp_time = dt.datetime.strptime(duration, '%M:%S')
        extra_minute = 1 if temp_time.second >= 30 else 0
        duration_minutes = temp_time.minute + extra_minute
        return duration_minutes
    else:
        temp_time = dt.datetime.strptime(duration, '%H:%M:%S')
        extra_minute = 1 if temp_time.second >= 30 else 0
        duration_minutes = temp_time.hour * 60 + temp_time.minute + extra_minute
        return duration_minutes

def duration_in_minutes(duration):

duration_parts = duration.count(':')

if(duration_parts == 1):

temp_time = dt.datetime.strptime(duration, '%M:%S')

extra_minute = 1 if temp_time.second >= 30 else 0

duration_minutes = temp_time.minute + extra_minute

return duration_minutes

else:

temp_time = dt.datetime.strptime(duration, '%H:%M:%S')

extra_minute = 1 if temp_time.second >= 30 else 0

duration_minutes = temp_time.hour * 60 + temp_time.minute + extra_minute

return duration_minutes

The format of the title changed over time. We can start with “title – speaker – conference” and then try to match partial results to something useful:

def extract_title(title_all, last_conference):    
    title = 'Unknown'
    speaker = 'Unknown'
    conference = 'Unknown'

    counts = title_all.count(' - ')

    if(counts == 1):
        title, speaker = title_all.split(' - ')

    if(counts == 2):
        title, speaker, conference = title_all.split(' - ')
        if('NDC' not in conference.upper()):
            a, b, speaker = title_all.split(' - ')
            title = f"{a} - {b}"
            conference = 'Unknown'

    if (counts == 3):
        titlea, titleb, speaker, conference = title_all.split(' - ')
        title = titlea + ' - ' + titleb

    if(counts == 0):
        title = title_all
        conference = last_conference
    else:
        last_conference = conference

    return title, speaker, conference, last_conference

def extract_title(title_all, last_conference):

title = 'Unknown'

speaker = 'Unknown'

conference = 'Unknown'

counts = title_all.count(' - ')

if(counts == 1):

title, speaker = title_all.split(' - ')

if(counts == 2):

title, speaker, conference = title_all.split(' - ')

if('NDC' not in conference.upper()):

a, b, speaker = title_all.split(' - ')

title = f"{a} - {b}"

conference = 'Unknown'

if (counts == 3):

titlea, titleb, speaker, conference = title_all.split(' - ')

title = titlea + ' - ' + titleb

if(counts == 0):

title = title_all

conference = last_conference

else:

last_conference = conference

return title, speaker, conference, last_conference

With the helper methods in place, we can use Scrapetube to fetch the videos from the NDC channel, extract the data, create a DataFrame in Pandas, and export it as a CSV file:

talks = []
last_conference = 'Unknown'

videos = scrapetube.get_channel("UCTdw38Cw6jcm0atBPA39a0Q")
for video in videos:
    videoId = video['videoId']
    published = video['publishedTimeText']['simpleText']
    title_all = video['title']['runs'][0]['text']
    title, speaker, conference, last_conference = extract_title(title_all, last_conference)
    lenght_text = video['lengthText']['simpleText']
    lenght_minutes = duration_in_minutes(lenght_text)
    url = video['navigationEndpoint']['commandMetadata']['webCommandMetadata']['url']
    talks.append([published, 'talk', conference, title, speaker, lenght_text, lenght_minutes, f"https://youtube.com{url}"])
    print(f"{published} = {title} = {speaker} = {conference} = {lenght_text} = {lenght_minutes} = https://youtube.com{url}")


column_names=['Published','Type','Conference','Title','Speaker','Duration', 'DurationInMinutes', 'Link']
stats_df = pd.DataFrame(talks, columns=column_names)
stats_df.to_csv('ndc_talks_youtube.csv', index=False)

talks = []

last_conference = 'Unknown'

videos = scrapetube.get_channel("UCTdw38Cw6jcm0atBPA39a0Q")

for video in videos:

videoId = video['videoId']

published = video['publishedTimeText']['simpleText']

title_all = video['title']['runs'][0]['text']

title, speaker, conference, last_conference = extract_title(title_all, last_conference)

lenght_text = video['lengthText']['simpleText']

lenght_minutes = duration_in_minutes(lenght_text)

url = video['navigationEndpoint']['commandMetadata']['webCommandMetadata']['url']

talks.append([published, 'talk', conference, title, speaker, lenght_text, lenght_minutes, f"https://youtube.com{url}"])

print(f"{published} = {title} = {speaker} = {conference} = {lenght_text} = {lenght_minutes} = https://youtube.com{url}")

column_names=['Published','Type','Conference','Title','Speaker','Duration', 'DurationInMinutes', 'Link']

stats_df = pd.DataFrame(talks, columns=column_names)

stats_df.to_csv('ndc_talks_youtube.csv', index=False)

When we run the extractor, we get a CSV file with around 3200 lines. We can check the conferences it could match with this code:

import pandas as pd

df = pd.read_csv('ndc_talks_youtube.csv')
print(df['Conference'].value_counts())

import pandas as pd

df = pd.read_csv('ndc_talks_youtube.csv')

print(df['Conference'].value_counts())

Conference
Unknown                2200
NDC Oslo 2023           130
NDC Oslo 2021           112
NDC Oslo 2022           103
NDC London 2023          92

Conference

Unknown 2200

NDC Oslo 2023 130

NDC Oslo 2021 112

NDC Oslo 2022 103

NDC London 2023 92

As expected, the many changes of the title format will require a lot of manual work to match it to the right conference. Nevertheless, for the conferences in the last few years it matched rather well, and we should be able to fix the data in a reasonable time frame.

We started with our thesis of the shorter talks and found a data source that can (dis-) prove our assumption. With the help of Scrapetube we got the data from YouTube into a CSV file that we can turn next week into the source of our analysis work.

Python Friday #187: Extracting the NDC Talks Data From YouTube

Thesis: The talks at NDC Oslo 2023 where shorter than usual

YouTube as our data source

How to extract the metadata from YouTube?

Install Scrapetube

Fight with Scrapetube

The YouTube data extractor

Next

Like this:

Related

2 thoughts on “Python Friday #187: Extracting the NDC Talks Data From YouTube”

Leave a Comment Cancel reply

Thesis: The talks at NDC Oslo 2023 where shorter than usual

YouTube as our data source

How to extract the metadata from YouTube?

Install Scrapetube

Fight with Scrapetube

The YouTube data extractor

Next

Share this:

Like this:

Related

2 thoughts on “Python Friday #187: Extracting the NDC Talks Data From YouTube”

Leave a Comment Cancel reply