Unlocking Cultural Heritage (Fall 2010)

News

Tuesday April 26, 2011 Although the UCH course is over already, initiatives for unlocking cultural heritage material are still ongoing. One of our students, Gaby Neubert-Luckner, recently visited several London-based art history archives and asked them about how they are using technology to make their collections available to the general public. A report of her trip can be found here. Thanks to Gaby for sharing this!

Friday January 21 Due to a flu we regret to announce that we cannot complete the assessments of your end of term papers today as promissed. We apologise for the delay and hope to have the grades available on Monday.

Wednesday December 22 The deadline for the end-of-term paper is Tuesday January 4, 2011 23:59. Please send the paper by email to both of us (maximum file size is 10 MB). We will send you a receipt as soon as possible hereafter.

Tuesday December 14 Due to illnesses several groups i sesion 1 and 2 cannot present today. We therefore move session 2 up before lunch and start session 3 at 12:40.

Sunday December 12 A few changes has happened over the weekend. Please find the final-final version of the schedule here and the final set of abstracts here. Note that a few oppositions have been moved around – our apologies to those that have to prepare feedback for a new group. With the schedule now firmly set we look forward to your presentations on Tuesday!

Friday December 10 The deadline for abstracts has long since passed and the final, updated schedule has been put online here. Be sure to check it out, since some sessions and presentations were moved around. As mentioned before, we encourage you to be there the entire day, but you are required to at least attend your own session. Prepare some questions/comments for the presentation you have been assigned to debate; please print, read, and bring the relevant abstract on Tuesday. The updated set of abstracts will be put online this weekend. And last but not least: don’t be late for your session!

Wednesday December 8 On Tuesday Dec 14 we will have the feedback seminar with end-of-term-presentations. We have chosen to have only one track, i.e. to not split you up, so that you will receive a broader feedback. The draft schedule can be found here. It includes both the abstracts we have already received and also some for which abstracts so far are missing. The latter have been placed mainly at the end of the schedule. If no abstracts/presentations are available for these we will move the remaining presentations forwards (but keep the order the same).

Each presentation is allotted 20 minutes (regardless of number of authors) in total – 10 minutes for the presentation, and 10 minutes for feedback. This makes for a rather long schedule – starting at 10:20 and ending at 17:00 if all presents. You are of course encouraged to attend all day. If you cannot be there all day, but are presenting, you are required to be there at least during the one of the three sessions in which your presentation is given (that is during all of session 1A/B, session 2A/B or session 3). Note that we expect you to take part in giving feedback. You may give constructive comments and questions to all presentation, but we also expect you prepare specially for one other presentation as opponents (see the ‘group opponents’ column in the schedule). We expect you to prepare in advance questions, comments and suggestions for that presentation in advance. The abstracts that you can use to prepare these are found here. For all of you, make sure to bring the abstracts so that you can consult them during the day.

As for your presentations please bring the powerpoint files on a USB stick, and please be early so that you can copy them to the presentation computer in the break before your session – we don’t want to waste valuable minutes looking for files. Finally, we will also bring the grades for assignment 3. Note that the schedule may be updated before the weekend as we receive a fee more abstracts.
 

November 25, 2010 The slides for the four real world case presentations are online!

November 23, 2010 We have made two example assignments available from the IR assignment. Both of these received a 12, but by doing different things. We think it could be very useful for you to study these examples and the way they solved the assignment. You can download example 1 and example 2 by clicking on these links. Note that the user/password combination is the same as for the book chapters you had to download earlier.

November 23, 2010 The trec_eval installation file has disappeared from the TREC website. We have found a copy and placed it locally here: http://itlab.dbit.dk/~blar/trec_eval_latest.tar.gz

November 16, 2010 Slides for lecture 9 are online!

November 14, 2010 Another important message from the Study administration:

The Studienævn at IVA has decided to make it possible for students at the International Master’s Degree to upgrade the English optional modules (“Collective Intelligence” and “The concept of Cultural Welfare”) to 15 ECTS in order for you to take these two modules together instead of one constitutional module and one optional module in Spring 2011.
During the whole degree you must, though, have one constitutional (20 ECTS) module.

As a consequence the deadline for choosing modules for the next semester has been postponed to 17th of November at 16.00 hr.
You can use Studienet for signing up or you can write a mail to
hcn@iva.dk.

Kind Regards
Studieadministrationen

November 12, 2010 Professor Emeritus Howard White visits RSLIS next week. I have talked him into giving a guest lecture on Tuesday November 16 at 10:30 – 12:30 in the auditorium. Howard is a fascinating scholar, a true polyhistor, and this is a rare chance to meet one of the giants of the field. I hope that many of you can make it. The topic of his talk, which is very pertinent to UCH, is:

Information Science and Relevance Theory

This talk will present concrete examples of how relevance theory (RT) from linguistic pragmatics can add to the explanatory power of information science (IS).  As a dual theory of cognition and communication, RT agrees with 40 years of cognitive information science in all major points, but also improves on it, being better thought through. RT can, for instance, clearly define major terms that have long been muddled in IS.  From these definitions, themes emerge that bring together various parts of IS not hitherto well connected, such as research on relevance judgments in document retrieval, and research on least effort in information-seekers’ behavior.  Perhaps more interesting is the light RT can shed on important statistical distributions in IS, including the bibliometric distributions that figure in domain analysis.

November 11, 2010 Important message from the Study administration:

“Dear graduate students,

Modules in Spring 2011
It is now possible to sign up for the modules in Spring 2011.
The sign-up deadline is November 15th.
Immediately after the deadline it is decided which of the offered electives are established.
Thus, if you want to be sure that your elective of choice is established, remember to sign-up before November 15th.

Concerning the constituent module
We kindly ask you to enter on “Studienet”, no later than November 25th:

  • Whether you want internal or external censorship (throughout your education you have one exam with external censorship and two exams with internal censorship)
  • The names of the other students you are writing your project report with if you are not writing it alone.
  • The subject for your project report
  • The name of your supervisor.

This information has to be entered at “Studienet” under “Tilmelding” -> “Framelding/bekræftelse”.

Students with the optional module of 15 ECTS, who wants external censorship, are kindly asked to report this in writing to the study administration.

If you experience problems entering this information at “Studienet” please send the answers to the above questions by Email to the study administration at hcn@iva.dk.

Best regards
The Study Administration”

November 9, 2010 The slides for lecture 8 and the LNCS template are online!

November 4, 2010 A small error in the ‘lemur-search.py’ script has been corrected. See the lab session page for more information.

November 2, 2010 The slides for lecture 7 are online!

October 26, 2010 The slides for lecture 6 are online!

October 18, 2010 Toine and I will sit down tomorrow (Tuesday 19th) to assign persons for the voluntary presentation of real world cases on November 23rd. Gaby has volunteered to do one article (hasn’t let us know which yet though) so all is up for grabs. Last chance to volunteer for this unique opportunity is tomorrow lunch time.

October 12, 2010 The slides for lecture 5 are online as well as an updated version of the schedule that has the updated deadline for the first lab session assignment.

October 8, 2010 We have finally settled the number of pages for each of your assignment types and number of students including differences arising from 10/15 ECTS. Please see the final overview here. You will note that the first two lab assignments will have a fixed number of pages:

  • 3 pages for a single student
  • 4 pages for a group of two students
  • 5 pages for a group of three students

This is regardless of whether you are 15 or 10 ECTS students. The variation dependent on ECTS is in the length of the end of term paper.

October 6, 2010 Our apologies for the errors you may have encountered in the tutorial exercises; we tried to account for all possibilities, but the real world is always more filled with errors than you expect. We believe the errors have been fixed, so go to the tutorial web page and download a new version of the spelling correction scripts. The tutorial has also been updated from the ‘Spelling correction’ part onwards. Because of these problems, we have decided to extend your deadline by a week to Tuesday October 26 at 23:59.

October 5, 2010 All the slides of lecture 4 are now online.

September 24, 2010 Many updates this week: after approval from the Study Council we have produced a new version of the schedule that includes the definitive deadlines for this course. You can find it here.

September 23, 2010 One paper has been taken off the reading list due to time constraints: Cucerzan’s Large-scale Named Entity Disambiguation based on Wikipedia Data. The list of required reading has been updated accordingly.

September 22, 2010 Note that there is no lecture the 28th of September; the next lecture is October 5th at 13:00. The second lab session will also take place that day. Please prepare for it by reading the instructions on the last two slides of the lecture on ‘Correction’! Also, make sure you have Ubuntu up and running on at least one of the USB sticks of your group and that you have completed the tutorial. If you are having problems with this, first check the UCH blog, which serves as an F.A.Q. If you still run into problems, drop by Birger (C4.22) or Toine (C4.02) in their offices. If you have an Apple laptop and would like to use that instead of Ubuntu, stop by Toine’s office. He can help you with installing the necessary software on your Macs.

September 21, 2010 In addition to the already uploaded slides for lecture 3, I have added my notes with the examples I worked out on the blackboard. You can find the PDF with the other lecture slides. Don’t forget to prepare for lab session 2; you can find instructions on the last two slides of the lecture on ‘Correction’!

September 13, 2010 A slightly updated version of the tutorial has been uploaded. You can find it on the Lab session ‘Ubuntu tutorial’ page.

September 10, 2010 If you can, print out the Ubuntu tutorial document for the first lab session on September 14, 2010. It is not necessary, but it will be easier to read the instructions from paper than to have to open a PDF reader in Windows and Ubuntu each time. You can find the PDF-document containing the tutorial on the Lab session ‘Ubuntu tutorial’ page.

About this course

Cultural heritage constitutes a society’s collective memory and is typically curated by libraries, museums, and archives. In recent years, many of these institutions have started digitizing their collections and making them available to the general public. This course deals with this entire process from digitization, cleaning & error correction to providing efficient access to these cultural heritage collections.

The course will focus on three main areas:

  • How can we digitize and automatically enrich our cultural heritage (e.g., data cleaning, NLP for cultural heritage)?
  • How can we provide efficient access to digitized cultural heritage material (e.g., search, browsing & tagging, and recommendation)?
  • Real-world cases from the literature: how are these issues dealt with in actual institutions?

Lecture schedule

Lectures start September 7th, 2010 and will take place in the 13:00-16:45 time slot. Most lectures won’t take this long, but some might and some might take more time than we expected, so don’t make any plans for 10 minutes after the tentative lecture end time. You can download the lecture schedule here. There is also a version available that includes all the course deadlines.

Required reading

You can find a list of the required reading here. Please note that this is not the definite list; material might be added (or removed) during term. There are four different types of required reading on the list:

  • Papers and articles available on the Internet. All of these are availabel on the Internet, either through portals such as ACM.org or through some creative googling.
  • Scanned book chapters. The Danish Copydan copyright agreement allows us to scan and distribute small parts of books for teaching. These will be made available to the students as scanned PDFs below. Username and password will be provided in the first lecture.
    • Mitchell, T. (1997). Machine Learning, McGraw-Hill, chap. 3, pp. 52-60
    • Mitchell, T. (1997). Machine Learning, McGraw-Hill, chap. 8, pp. 230-236
    • Jackson, P. and Moulinier, I. (2002). Natural Language Processing for Online Applications, John Benjamins Publishing Company, chap. 4, pp. 141-144
    • Jackson, P. and Moulinier, I. (2002). Natural Language Processing for Online Applications, John Benjamins Publishing Company, chap. 5, pp. 180-183
  • Book chapters available on the Internet. Some books or book chapters are available for free on the Internet.
  • Books available from the library. Some books are available from the
    ‘term shelf’ in the RSLIS library.

What all of this means, is that you do not have to buy any books for this course. Please see the required reading list for more details.

Lecture slides

Lab sessions

This is hands-on course, which means we will have four lab sessions spread out over the semester. For these lab sessions we kindly ask you to buy a new USB flash drive of at least 4GBs in size (or use an old one you don’t have any use for anymore). We will be using these in our lab sessions. Note that we will wipe everything on the flash drive, so do not use it as a separate backup for your files!

We will make separate pages available for each of the lab sessions: