A brief introduction

This page includes a list of links and other potentially useful references for RUSKA 2022 event organized for and by TUTA community. More specifically, this page serves as a supplementary material to a workshop focused on machine learning (ML) based tools for scholars. If you have any questions, comments, or feedback related directly to this material please contact Tomasz Mucha. The tools presented here are selected based on my experiences and explorations.

Speech-to-text examples and ideas

Transcribe a quick note on the go

While listening to a podcast I wanted to quickly take some notes. To do that I used Google Assistant to activate Live Transcribe (I’m pretty sure that there are good alternatives to that app for iPhone users).

Live transcribe

Transcribe a meeting or an interview

When talking about online meetings most of us think about Zoom as the first choice. However, due to Aalto default settings (it seems that) automatic transcription is not enabled. Thus, it turns out that MS Teams is a better tool for that job.

After starting a meeting you need to start recording and transcribing the meeting.

MS Teams transcribe

After that you should be able to see the transcribed text appearing almost in real time.

MS Teams transcript shown

When the recording is finished and you close the meeting the video recording will land in your OneDrive online folder titled “Recordings”. Getting the transcrip still takes a few more steps. You will need to find the file in your OneDrive – you should use the online version available via MS Teams app or web browser, not the normal file browser/explorer. Once you open the video recording, you should be able to locate the transcript and the “Download” button.

Download transcript

The transcript will look as follows:

WEBVTT

67fab71c-209a-4bcc-8a57-e4a76b5fc99a/12-0
00:00:15.864 --> 00:00:19.684
OK, now it seems that the.

67fab71c-209a-4bcc-8a57-e4a76b5fc99a/20-0
00:00:20.504 --> 00:00:24.964
Tool Microsoft Teams is
recording what I'm saying.

67fab71c-209a-4bcc-8a57-e4a76b5fc99a/28-0
00:00:27.544 --> 00:00:31.114
May I don't have any other
participants in this meeting.

67fab71c-209a-4bcc-8a57-e4a76b5fc99a/33-0
00:00:32.094 --> 00:00:34.164
But whatever I say.

67fab71c-209a-4bcc-8a57-e4a76b5fc99a/48-0
00:00:35.054 --> 00:00:39.764
At least more or less
approximately. He's appearing as

67fab71c-209a-4bcc-8a57-e4a76b5fc99a/48-1
00:00:39.764 --> 00:00:41.134
transcript here.

67fab71c-209a-4bcc-8a57-e4a76b5fc99a/56-0
00:00:45.554 --> 00:00:49.024
So what you can see is that.

67fab71c-209a-4bcc-8a57-e4a76b5fc99a/71-0
00:00:49.794 --> 00:00:53.407
You're going to have a
conversation that is being

67fab71c-209a-4bcc-8a57-e4a76b5fc99a/71-1
00:00:53.407 --> 00:00:56.514
recorded and transcribed at the
same time.

What if my meeting is on Zoom or in person?

You can still generate a transcript, if you have a video (.mp4) or audio (.mp3) recording from that event. (Bonus tip:  Even in a physical meeting you can automatically generate a transcript, if you are able to turn MS Teams on and record the meeting there with you being the only participant.)

Let’s start with audio recording (.mp3 or an audio track extracted from a video). This approach to generating a transcript relies on MS Word. For converting an existing video (.mp4 file) to transcript you can use another tool from Office 365 suite – MS Stream.

Generating transcript from an audio recording using Word

Let’s start Word first – not your desktop version – but rather the online version, which you can access via MS Teams.

To access the functionality that we need, you will need to open that file in browser.

Once you are able to edit the Word fiel via a browser you should have access to more features. Choose “transcribe”.

After uploading the .mp3 file that you want to transcribe and selecting the language, you should be able to see the transcript. This approach has several nice features – you can:

  • change the audio playback speed
  • select the timestamp from which you want to start listening
  • edit the transcript
  • insert it into the Word document
  • assign names to individual speakers (that have been audomatically differentiated based on voice).

Generating transcript from a video recording using Stream

First, you need to upload a video to MS Stream. Use this link -> https://web.microsoftstream.com/upload (Aalto login might be required, if you are not recognized).

While the video upload is running in the background, you can already now select that you want to keep the video private. This is probably what you want to do with most of your videos that are used for generating transcripts. MS Stream by default assumes that you want to make available the uploaded video to all Aalto users. Make sure you untick the box in Permissions sections. (BTW, if you forget to do it, the video will not be immediately made available to everyone. It is only after you publish it. So “privacy by default” is still working here.)

After the video is uploaded, go to “My content” and select “Update video details.” You will be able to download the automatically generated video transcript from here.

How about dictating my own notes or other text?

All of the methods presented thus far work for dictation, but they are not the most efficient ones. Probably, the most direct approach is to dictate text directly in Word. You can do that, this time, in you desktop version of Word as well.

Text-to-speech examples and ideas

Now, let’s swap the objectives – we have a document with text and we want computer to read it aloud for us. Again, you can leverage MS Word to carry out that task for you. Let’s check two common scenarios:

  • reading aloud your own text
  • reading aloud an article that is in PDF format.

Once you have the document open, just click on “Read Aloud.”

You will be able then to select where within the text the reading should start, at what pace, and you can also choose between several voice types. I recommend adjusting the reading spead to match the difficulty of the text and the level of understanding that you are after. With some practice, you will probably be able to increase you reading speed and retention at the same time.

To improve your concentration and comprehension, you might want to experiment with immersive reader enabled before you use read aloud feature. Immersive reader hides some of the clutter on the screen and should help with focus while reading.

Once you have used these features in Word, you are just one step away from starting to read scientific articles using text-to-speech. Now, you just need to open a PDF file directly from Word applicaiton the same way as you open any existing Word file. As long as the PDF is not created from scanned pages or is locked in some ways, then you should be able to read it aloud as any other text.

Use text-to-speech on your mobile

Similar functionality is available from mobile devices as well. Two apps that have been sucessfully used by TUTA researchers are:

Quick dive into new literature using topic modeling

“A topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents.”

Source: Wikipedia

Let’s say you want to quickly get an overview of topics that have recently been considered by articles dealing with sustainability in supply chain management. We can pull these article from Scopus using the following advanced search query:

You can export the results in CSV format. The resulting file might look like this. (If you want to view it, try downloading it and opening in Excel.)

You can leverage topic modeling to kick-start your literature review even if you don’t know anything about topic modeling. This is just a tool and it can be used in a very lightweight way as well.

You can explore the topics from the abstracts retrieved from Scopus using this Google Colab.

(Tip, you can reuse this Colab for your own analysis. All you need to do is define:

  • url (link from which CSV file with abstracts is located)
  • no_topics (how many topics you’d like the algorithm to identify)
  • no_top_words (how many top words you’d like the results to show)
  • no_top_documents (how many top documents you’d like the results to show).

Generate images using words

The quality of automatically generated images from text is improving rapidly. You can typically generate such images and use them without copyright restrictions.

Try generating some images using the latest Stable Diffusion model.

A Finnish forest and lake landscape

Then, try something more unexpected:

A camel surrounded by Finnish forest and lake landscape

If you’d like to get the results faster and have more control over the outputs, you’ll need to create an account (limited free trial) for: