Category:Person: Difference between revisions

No change in size ,  18 June 2010
no edit summary
No edit summary
No edit summary
Line 1: Line 1:
''In which I try to explain technical and practical aspects of the Semantic Web to a lay audience, of which I am part. '''Contributions welcome''', this is a wiki.''
__NOTOC__


= Overview =


== Ramblings ==
<div style="float: right; margin: 15px">
http://farm3.static.flickr.com/2666/4027443168_879c7b7ccf.jpg
</div>


The Semantic Web is a concept that allows massive, reliable reuse of data. One of the most remarkable things about the Web is it is based on HTML, a text format that is highly accessible by humans and computers. Every Web page uses the same syntax to indicate what should be displayed, they all use the same retrieval mechanisms. This was a remarkable and unexpected (disruptive) breakthrough in communications, but the way companies jumped in to make the Web more attractive did little to make the exchange of data easier. Efforts over the years have struggled with complexity and standardization, with major initiatives interfering with each other for technical reasons (eg Microformats vs RDFa) or while trying to dominate in the market.  
Prepared for http://semantic-mediawiki.org/wiki/Spring_2010_SMWCon


One of the concerns has been the model for how information will be shared.  Today it's common for non profit organizations to hoard their information, to create "proprietary databases" they can use to pitch to granting agencies. Another factor is that ignoring standards allows efforts to move ahead on their own terms, without making their systems fit into larger systems which could slow them down. Another factor is insecurity - an organization may have a perfectly useful database, but in implementation it may not compare well to best technical efforts. There are two main requirements - well known mechanism usable by any organization, and the schemas/ontologies, descriptions of how the data will look for reliable re-use.
== Who is a non-technical person ==


Yet, the Internet has been mainstream for 15 years, and we're starting to see real breakthroughs in Semantic Web type applications. With unlimited room for our improvement by building on rather than hoarding data, and the recognition of the value of a true participatory society, many efforts to not share data start to appear ignoble. An unidentified new sector of public participation is developing based on the ease and minimal cost of gathering and organizing data functionality and interested parties on the Internet. The cost is simply making data re-usable, however many agencies fear this approach since it will affect their societal placement (and most don't trust 'the masses').
* Their focus is not technology, but they can contribute
* Never learned programming concepts
* Didn't realize Wikipedia can be edited
* Maybe used a web content management system, blog, Facebook
* Busy with their own concerns


== Approaches to Semantic Web applications ==
== Types and motivations of participants==


=== Mining ===
* Traditional '''executive''' — "everyone else is doing it," inexpensive solution
** ideally they will participate but getting them to can be difficult
** may be more cautious about full commitment - license, security, who can access and edit
* '''Creative group''' or '''individual''' — may be inspired but needs constant guidance
* '''Worker bee''' — tasked to use the wiki
** may be less receptive to wiki ideals, make it straightforward
* '''Outside contributors''' - often a stated goal of projects, have their own objectives
** flexible to meet random demands
** fair re-use terms


There are essentially two types of SemWeb applications, mining and intentional semantic development. One technique in mining is "scraping" to parse presumably reliable HTML pages. Many citizen projects use this technique to extract public data from recalcitrant government sources, for example, [http://www.theyworkforyou.com They Work for You]. Mash ups are related, sites like [http://www.housingmaps.com Housing Maps] combine data from disparate sources into one useful interface. However, scraping can be easily foiled by obfuscating low level structure, intentionally or not.
<br class="cleared" />


Another mining approach involves scraping human oriented text. [http://www.opencalais.com Open Calais] is a infrastructure example of this. [http://healthbase.netbase.com Health Base] is an end user application. These sites use patterns in human text to try to derive statements. This technique is easily foiled leading to incorrect observations.
= Goals =


=== Intentional markup ===
== What do they want ==


Intentional semantic development involves explicit markup of text items. Most HTML documents today contain only text and links. Semantically marked up documents have explicit annotations about data objects, indicating them as entites such as people, places, dates, and so on. Relations (links) have explicit meanings.
# to solve their problem, often a "one of those" web site with some special requirements
# something that looks good - '''design is still paramount'''
# to learn about the participatory web
# to have more control over their own site but keep things simple
## usually they don't want to 'innovate,' just do what everyone else is doing
# to work with someone they trust
# don't really seem concerned about "silo" and re-use aspects


In [http://www.foaf-project.org/ FOAF], we can indicate "me" links  on our home page that indicate another representation of ourselves. We can indicate links to friends, business associates, and organizations. It quickly becomes apparent that decentralized Facebook sites could be enabled, where individuals can publish their information wherever they like, using whatever licenses they like, and sites like Facebook can provide their own views of these webs of data.
== What do I want ==


Using RDFa and Microformats, annotations are added to regular HTML that give them semantic meaning. A person's information can be marked up with hCard, allowing you to "right click" and add that person to your address book. Similarly formats exist for locations and events.
# avoid per client custom code, fit things into the developing picture
# '''promote digital literacy ­— filling out forms isn't it, stop treating computers as a typewriter'''
## reference-able statements, reusable data under fair terms of re-use
# get people to consider issues of site design and how to organize information without overburdening
# '''promote transparency and co-development'''
## help flatten organizations and their external relationships
## don't be fearful and build hidden compromises, open it up
# grow my own skills based on relevant requirements


Google, Yahoo and others use these formats to make their results more reliable. It used to be their information guessed what content on a page was content. So if you searched for "frames," looking for picture frames, you would be likely to find a page that referred to "frames" in its navigation. RDFa and Microformats allow more reliable markup of subjects, allowing meta directories to embed reviews from any cooperating site rather than trying to do everything themselves - because these reviews link back to the originating site, it's a "win win win" situation, for the meta directory, originating site, and end user, with richer, less biased results when a critical mass is reached.
= Success=


The heavyweight options are systems such as RDF and Topic Maps. They provide a complex interlinked way to describe arbitrary data. Today they are only used for specific projects, but as their use grows we can expect the web to become more interlinked allowing an endless assemblage of information using the best references.
In order:


Next: [[Semantic Mediawiki and the Semantic Web]]
# Useful one-off resource with lots of development input from stakeholders, possible to build on in future
# contributions by many types of people
# basic editing using forms
# wiki markup, categories
# sharing knowledge, creating more converts
# Creating templates/queries/classes
# understanding of good class design, distributed data, licensing
# reuse ontologies and web-based content
# distributed applications, creating standards


[[Category:SemWeb]]
= Failure =
 
<div style="background: black; padding: 50px">
<center>
http://wiki.zooid.org/images/no-edits.png
</center>
</div>
 
* Commitment vs follow through
** Constant attention, guidance
* Tangly mess
** Better use of SMW features, more forms, patience for gardening
* Misunderstood requirements, not really listening to what they want
** Learning experience
 
= How =
 
Lots and lots of guiding
 
Inspire - self empowerment, learning culture, creating, leading, "coolness" (graphs), "where the web is going," open source and transparency, participatory web
 
Reassure - built on Mediawiki, always exportable
 
Threaten - others are doing it, loss of leadership
 
Blow past increasing complexity of security to simpler wiki model (all private or all public)
 
Really enforce importance of discussion tab, history, diff, learning from others ('''view source''').. site evolution as an interest
 
Appoint leads based on interests, give them responsibilities
 
Peer helpers — spread the virus
 
Translators for those who can't directly contribute
 
Profiles of uses
 
Magic — Exhibit example of copying filtered data and pasting to spreadsheet
 
The importance of design
* promote the cues of Wikipedia but provide something original
 
<div style="magin: 15px">
http://wiki.zooid.org/images/innocell-wp.png
</div>
 
= SMW vs Wordpress =
 
Compare Wordpress vs SMW — "Raskin vs Englebart," specialized appliance model vs learning to use a computer
 
http://www.retrofacto.com/wp-content/uploads/2009/06/wordpress.jpg
 
* '''Wordpress''' is task driven software. Forms for every day tasks. SMW is building blocks, play-doh. Helpful to map MW, SMW, extensions but it still won't be the same.
 
Every day tasks —
* Everything oriented towards content management around blogging
* SEO, user management
* "Delete" content, one click
* SMW site will need a UI (immediate functions) inside a UI (MW)
* Learning curve is constant on wordpress, gets steep fast using SMW
 
= Summary=
 
* Some successes with non technical users, usually individuals within orgs
* Patience, constant guidance, listening to requirements are most important
* Build up big expectations but focus on immediate goals
* Some SMW facilities such as task oriented guides would help a lot
** Slick rich page editors - links and annotations - would help a lot
** 'Class' editors work with existing elements
** Visual form editor
* People like visualizations and they can help with shaping and debugging data
* Always focus on their goals rather than ideals, but try to explain the vision, the two should come together
* Use lots of meaningful examples
** Placeography, DiscourseDB, sites related to organization
** Good distributed examples would help too
 
== Questions ==
 
* What are the best extensions to make the site easy to use?
** Do people use the rich text editor?
* Is SMW a good "universal platform?"
 
= Notes =
 
==Notes from conference==
 
* Business people love reports
* Authoring tools - ease of tagging
* Stop saying "semantic" — "knowledge engineering"
 
<headertabs />