There are a lot of travel blogs in China and Taiwan that sharing comments in the text with photos about different travel spots in Hong Kong. In order to find out the most popular HK travel spots in the market, we need to collect, cleans and analyze all of those structured and unstructured data altogether before a representative ranking can be drawn out.
This challenge is to convert those unstructured data into structured data at the first step. Participants will be provided with 24 web documents, in traditional or simplified Chinese, which were previously acquired from different traveling blogs. These documents are about the itinerary of the bloggers after they traveled different spots in Hong Kong.
Participants are requested to invent and develop smart algorithms that can identify the 5 key data elements (including destination name, destination type, destination image, image type, emotion to the destination) from the documents and present them in the format according to the expected sample output file.