O Information Architecture Institute oferece um financiamento de U$ 1000 para novas idéias em Arquitetura da Informação. O Fred conversou com o jurado principal, Andrew Hinton, que sugeriu o envio de projetos. O financiamento é pequeno, mas dá pra desenvolvermos melhor o conceito.
Nesta wiki escreveremos a proposta.
-------------------
Integrating and fragmenting data: a proposal for evolution of folksonomy
Introduction to "Connecting Contents"
Connecting Contents is a project that studies how the content is organized, classified, created andremixed on the internet. Promoted by Faber-Ludens Institute since december 2008, the Project is running with open participation for any user and by the Share Alike license from Creative Commons, at a development model like the Opensource Software (Raymond, 2001).
The Project's goal is to study the data organization and classification in order to favor the interpretation of contents by users, in a context that the compsution and production of data grows exponentially. The already initiated studies on the Project wil be discussed on this article, with a focus on: folksonomy, tags, meta-data and classifications, and also on the intra-tags proposal.
1. Important concepts: Data, Information and Meta-data
Some terms refer to objects so close that its meaning can be confusing. The study "Knowledge Map of Information Science", for example, found more than 130 approaches to the definition of the terms "data","information" and "knowledge". Because of this, it is important to characterize how the approach of some terms frequently cited is this article.
To Valdemar Setzer (1999), Data are sequences of quantified or quantifiable symbols that can be fully described through formal and structural representations. Texts, images, sounds and animation are Data: (...) pois todos podem ser quantificados a ponto de alguém que entra em contato com eles ter eventualmente dificuldade de distinguir a sua reprodução, a partir da representação quantificada, com o original.” (Setzer, 1999, p.2) [traduzir em rodapé].
A Data need not have to be intelligible or have some connection with what it describes. In turn, information is the understanding of what it represents something meaningful to an individual. While Data is a purely objective - not depend on your user - information is described in an objective manner (text, pictures, etc), but its meaning is subjective and dependent on the User.
Also, Setzer (1999) define differences between Data and Information: The Data represents an Information and can be stored in a computer. However, the Information can not be prosecuted for it, because its meaning depends on who receives it. Data is how the computer works: it interprets the pre-received inputs and transform into binary data. A computer cannot interpret data and create information. Only humans can work with information. People grab this data, intepretate, meshing with their culture and knowledge transmuting datas to information.
1.1 Ambivalence between Data and Meta-data in Information.
The most common definition is from its etymological meaning: "data over data". "The metadata is a data that describes another data, showing its attributes and characterizing their relations, aiming at their access and potential use" (Oliveira, 2002). [traducao ou citacao original??] Therefore, its understood that metadata is a data associated to another data in order to help users extend the knowledge of its existence or its characteristics.
Therefore, the computational logic of using metadata to give meaning to data, don't works for people: for them, data and meta-data has the same value (ie, are ambivalent) as both are interpreted as information. Furthermore, a meta-data always remains a Data. When one has a Data Meta-data, it is important to realize that this is also a Data Meta-data of the other, in a two-way relationship.
1.2 Limits of Folksonomy and the Semantic Web
On the current amount of data on the internet - spreadsheets, databases, wikis, videos, websites, etc - a lot of this cannot be used because they are not classified. The Semantic Web, therefore, presents itself as the solution to organize the informational chaos on the web nowadays, allowing that this contents can be connected, reused and redistributed between applications in a more easy way.
The Semantic Web its a framework proposal for the web evolution, so that it ceases to be a network of documents and get transformed into a data network. According to Oliveira (2002) the Semantic Web is presenting itself :
“A web semântica será uma extensão da web atual porém apresentará estrutura que possibilitará a compreensão e o gerenciamento dos conteúdos armazenados na web independente da forma em que estes se apresentem" (Berners-Lee et al, 2001 apud Oliveira, 2002) PEGAR citação do Berners-lee já em inglês
Making the web a intelligent system is the great challenge of the Semantic web. On this way, there are efforts seeking to create an Artificial Intelligence, a computer science that simulate the human ability to think and solve problems - allowing that a machine interpretate data in a distinct way.
However, these proposals for the Semantic Web have a lot of problems, mainly to Artificial Intelligence: Noam Chomsky, and others researchers, say that a programming language is not in fact a form of language (1999, p.3). Neither an information could be inserted in a computer, because this way it would be reduced to data and the information is a semantic process performed by humans.
Metadata is considered one of the main points for the Semantic web can create relations anda give meaning for data. This understanding comes from a confusion of Semantic web proposals that reduces the information process as a simple connection between data.
--> continuar!
According to Setzer, a essencial difference between data and information is that the first is purely syntatic and the second necessarily contains semantic (implicit in the word "meaning" used on its characterization). Its precisely the possibility of using metadata like a "connection" between data, that is expected to a computer understand information and give meaning for the Semantic web. The humans are constantly looking for meaning and understanding, and therefore the data, while understable, are incoporated by the person as information, in which data make mental relations with concepts.
Segundo Setzer, uma distinção fundamental entre dado e informação é que o primeiro é puramente sintático e a segunda contém necessariamente semântica (implícita na palavra “significado” usada em sua caracterização). É justamente na possibilidade de utilizar metadados na forma de “conexão“ entre dados, que espera-se que um computador compreenda informações e dê sentido a Web Semântica. Os seres humanos buscam constantemente por significação e entendimento, e por isso os dados, quando inteligíveis, são incorporados pelo sujeito como informação, no qual os dados realizam associações mentais com conceitos.
Thus, the Semantic web commonly understand that for a data to become information is necessary only the association of datas, forgeting that the reader, human, is who associate datas as information. More than help on classification and retrieval of datas on the Semantic web, metadata pervade the data, making it difficult to distinguish them: Is the cover of a book data or metadata? On a single blog's page, is the data a text content or the entire page including comments from visitors?
Assim, a Web Semântica costuma entender que, para um dado se tornar informação, é preciso apenas que dados se associem com outros dados, esquecendo-se que é o leitor, humano, que associa dados como informação. Mais do que ajudar nesta classificação e recuperação de dados na Web Semântica, os metadados permeiam os dados, tornando-se até difícil de distingui-los: Seria a capa de um livro dado ou metadado? Em uma página de um blog, o dado é o conteúdo de um texto ou toda a página apresentada, incluindo os comentários dos visitantes?
Therefore, the great impasse of the Semantic Web is the variability of the human process of interpretation, which makes it difficult or almost impossible for computers to predict them. So we propose that the Semantic Web act as a facilitator of the user's interpretation on this connection process of data that are in the computer (and on the internet) and his mind.
Portanto, o grande impasse da Web Semântica é a variabilidade do processo humano de interpretação, o que torna difícil ou quase impossível que computadores possam prevê-los. Propõe-se, então, que a Web Semântica seja um facilitador da interpretação de seus usuários neste processo de conexão entre os dados que estão no computador (e na internet) e a sua mente.
It is possible, for example, make the connections that the users do. No link better than the one that users themselves offer, which can be found on sites like Amazon.com.
Exemplificar amazon
É possível, por exemplo, aproveitar as conexões que os próprios usuários realizam. Não há conexão melhor do que aquela que os próprios usuários propõem, o que pode ser verificado em sites como Amazon.com9, em que a
?.? Conteúdo gerado por usuários em websites colaborativos (as a possible aswer e de como o conteudo colaborativo resignifica os dados)
A colaboração pode ser entendida, portanto, como a adição, subtração ou simplesmente transformação de dados realizada por mais de um autor em espaço comum. Por ser lida em um mesmo contexto os autores podem, no processo de colaboração, mudar a interpretação de um leitor daquele dado na qual a colaboração está incluída.
Em um blog, um comentário de um usuário faz com que aquela página se torne uma obra que transcende as informações postas pelo seu autor original. Considerando o comentário como um metadado (ou seja, um dado que faz referência a outro dado), é passível de ver a inclusão de metadados como um processo colaborativo que resignifica os dados.
Como exemplos, pode-se imaginar o título de uma fotografia cujo tom de ironia permite outra interpretação da foto ou um comentário em um blog que abre a possibilidade de contestação da veracidade do texto comentado. Em ambas as situações, trata-se de um dado que traz informações sobre outro dado: no caso da fotografia, o título é um dado (textual) não contido na foto em si, mas que descreve um outro dado (a fotografia, imagem). O título de uma foto pode mudar, e o dado da foto continuar a mesma, por exemplo. Porém, a informação que está na foto não será a mesma, pois a interpretação desta pode ser muito diferente, devido à ironia do título.
3. Proposals
Based on the situations and problems presented, the project Connecting Contents sought some proposals to extend the possibilities of the folksonomy classification tools, based primarily on the idea of Intra-labels, which will be introduced in the following item.
The folksonomy labeling system solves some of the conflicts found in taxonomic classification, but also brings its own dynamics and new conflicts. More or less tags? [o que etiquetar] What I have to tag? What is the optimal number of tags? What informations of a content deserve to be on the tags? Tags should refer to the content as a whole or should try to refer to each piece of the information found?
The W3C has identified that even with the facility created by the labels. The current folksonomy services focus on the subject (the URL) and object (the label itself), but the connection between them is often a failure. Another problem that arises is the number of labels used with the intention of having the same meaning but different in the way of writing (case, use of space or underscore). This lack of standardization creates obstacles for a large-scale use, especially for the processing of these data by a computer.
The folksonomy also presents difficulties as regards the classification of huge content (like movies or books) or that deal with various subjects (such as a newspaper), where the labels on these contents may be conflicts, whereas large content can mean a great amount of information. Often labels are inserted in any content, but this label makes sense only a part of it.
3.1 Understanding "Folcsonomy of All"
3.2 The Geography of the Markings
3.3 Intra-tags
All and the Partes... recortes de conteúdo
3.3 Prototypes
The concept of intra-labels are still in development, but is already possible to see some of their applications, which will be addressed in this session.
The analysis of tags that are related to the entire contents of the intra-labels, which relate to parts of the content (internal cuts), creates opportunities for relevant data /sejam/ to be easily found and also linked with greater precision.
Currently, as already shown, the folksonomy only allows one data item is labeled, the whole, the example in Figure 8[?], a die has three labels, which were inserted to address each one, all that it presents: the sea, sand and the coast. Within this data markings there are also intra-labels, which specify other labels that are given in this: the sea, the foam, the star-fish, among others. All these intra-labels are presented in the context of data, but refers to a more content-specific data, and so were included only in part of because they are on, providing a more accurate labeling.
Figura 7 – Protótipo de introdução e gerenciamento de Intra-etiquetas
In the last frame of [Figure 8??], the user has accessed a marking of a data, to see it in the form of a isolated cut. Thus, this cut is designed as a new data, which has its tags (which are the same as they were intra-tags before, when the reference point was other data) and its own intra-tags.
For a useful example of application of tags, imagine a film drama that takes a User that has only one funny scene. If this movie is labeled, the User would have to label the whole movie with the label "funny", which might confuse him later when he needs to retrieve this information. With the use of intra-labels, the user can label only the specific scene that he finds funny, for an easy retrieval of information in the future. Thus, when the User searchs for funny scenes, he can find this specific content rather than the whole movie.
4. Future researchs and ideas
The history of folksonomy must be depthly studied to understand the limits and the and possibilities os this type of classification. The notion of Geography can be improved and a future notion "outside-tags" can be inserted to do the invert process.
The concepts exposed in this paper have the prototypes already shown, but this prototypes must be refined with more care, for a concrete proposal of real aplication. In a first moment, a plugin for browsers with a webpage like the Delicious can work, but it's needed, for a easy and pratical use of that proposals, that the builded aplication can work with the actual uses of the folksonomy, like a mash-ups or the aplications for Twitter like Twidraw or Twitpic: that works with and independent of the Twitter.
This project has already resulted in two academic journals. A article with the theoretical discussion of the project, presented at the 3rd EBAI and a poster presented at USIHC 2009, with the sub-projects showing the practical application of some ideas from these discussions. These presentations have produced positive results, attracting interest from professionals and scholars in the field, finding in our proposal many opportunities for the areas of Information Architecture, Library, in studies of the Semantic Web and direct activity of Internet users.
Finally, we believe that the development of this proposals will benefit both theory an pratice of the IA, allowing a wide manipulation of systems of classification.
-------------------
Prazo: 31 de outubro de 2009 (deadline)
Como: As propostas devem ser escritas em inglês e não devem passar de 2000 palavras;
Objetivos: to encourage researchers and practitioners to investigate IA-specific issues, and to publicize useful work that furthers the information architecture body of knowledge.
Links e informações: