Template:AAType: Difference between revisions

From Asian Canadian Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
I'll be leading the following session:
<blockquote>Wikis are commonly used for group content development. Semantic Mediawiki adds easier to use forms and the ability to annotate distinct content such as places, people and dates for reuse in interactive views and queries. In this session, we'll look at how semantic wikis can be used to develop content for indigenous and linguistic communities, with a practical, hands on focus including how to create pages, categories and simple ontologies. We will focus on processes to make sites inclusive, and examine fair terms of re-use.
</blockquote>
 
= Web =
 
* International
* Focused on presentation
* Anyone can create a site
** Network effect
* Hyperlinks


<blockquote>Wikis are commonly used for group content development. Semantic Mediawiki adds easier to use forms and the ability to annotate distinct content such as places, people and dates for reuse in interactive views and queries. In this session, we'll look at how semantic wikis can be used to develop content for indigenous and linguistic communities, with a practical, hands on focus including how to create pages, categories and simple ontologies. We will focus on processes to make sites inclusive, and examine fair terms of re-use.
= Wikis =
</blockquote>
 
* Portland Pattern Repository - 1995
 
Allows easy, quick editing of any page, usually by anyone.
 
* Types of wikis
** Personal wiki
** Group
** Organization
** Topical
** ...
 
== Wikipedia ==
 
<div style="float: right; width: 48%">
http://upload.wikimedia.org/wikipedia/commons/6/63/Wikipedia-logo.png
</div>
 
* 2001
* Anything accepted as notable, neutral point of view
* Careful controls on legal issues so content can be re-used
* Anyone who uses the Internet knows, uses, respects, understands it
* Cultural translations
* Few understand how it's edited
* Designed to be open, but adding controls
* More upcoming focus on media
<br class="cleared" />
 
= Mediawiki =
 
* Free Software (GPL)
* Hundreds of thousands of sites
* Hundreds of extensions
* Page, not content manager
* Designed for open editing, can be closed
 
= Wiki use and editing =
 
On any wiki with critical mass:
 
* Readers
* Drive by editor
* Topic editor
* [http://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits#List Nerd core]
 
* Typical process
** Messes
** Gardening
 
== How to use a wiki ==
 
* Categories
** Many categories, ways of looking at things
** Anyone can manage them
* Recent changes
** [http://en.wikipedia.org/wiki/Special:RecentChanges wp]  vs [http://www.asiancanadianwiki.org/mediawiki/index.php?title=Special:RecentChanges&from=20100716103952&days=300 small site]
* Page history
* [http://en.wikipedia.org/wiki/Talk:Montreal Page discussion] - tags and backstory
* User pages
** http://en.wikipedia.org/wiki/User:Rjwilmsi
 
* Importance of transparency, inclusion
** [http://wiki.zooid.org/wiki/Special:Version Free software]
** [http://wiki.zooid.org/wiki/Special:Export Portable]
 
* Editing - start with an outline
 
= Editing wikis =
 
* Wikis typically use their own markup (syntax)
* Rich/wysiwyg editors create messes
** Typical word processors adding junk, can't re-use for headings, can't compare versions
** No re-usable meaning added
 
== Progressive learning ==
 
<div style="float: right; width: 48%">
http://upload.wikimedia.org/wikipedia/commons/6/64/MediaWiki_logo_without_tagline.png
</div>
 
* text - just enter it, with a blank line between paragraphs
* <nowiki>= Heading 1 =</nowiki>
* <nowiki>* Bullet point</nowiki>
* <nowiki>[[Wiki link]]</nowiki>
* View source to learn from others
* Edit sections to compartmentalize changes
 
Brackets must be matched!!!!
 
<br class="cleared" />
 
== Elements of markup ==
 
* Headings, bullets, tables, dividers
** HTML code - http://www.asiancanadianwiki.org/wiki/Main_Page
* Links - onsite and off
* Templates
** Consistent appearance and content <nowiki>{{City|Population=30,000}}</nowiki>
** [http://en.wikipedia.org/wiki/Montreal Montreal] , [http://en.wikipedia.org/wiki/Ottawa Ottawa]
 
= Semantic =
 
* Computers are not very good at understanding human language
* Adding re-usable meaning through relationships.
 
 
* Triples - expressing relationships
** Subject, predicate, object
 
 
* David lives in Montreal.
** David: subject
** Lives in: predicate
** Montreal: object
 
 
* Subject, predicate, object have their own relationships.
 
* Montreal is a city.
* A city is a place.
* Montreal is the English word for Montréal.
 
*  What are the names for the place where the St Lawrence and Ottawa rivers meet ?
 
== Web of data ==
 
<div style="float: right; width: 48%">
http://semantic-mediawiki.org/w/images/smw.gif
[[File:200px-Parmenides.jpg]]
</div>
 
* The text is the database.
 
* Making statements across web sites.
* Creating views based on multiple web sites.
* Across wikipedia: Places with populations between 100 and 10,000 with a  
* Technical ontologies
** Classes (categories) - '''person''', place, date
** Properties - birth date, birth place, current location, height
** Inference
*** If a person (class) was born in Canada, they speak English or French
**** Guessiness
* Usefully and easily combine different data sources - data network effects
** Recipe from one site, ingredients from another
** Web searches that only include precise content
** Aggregate reviews from different sites
<br class="cleared" />
 
[[File:HypermediaDiscourse.png|750px]]<br class="cleared" />
[http://www.slideshare.net/sbs/on-social-learning-sensemaking-capacity-and-collective-intelligence On Social Learning, Sensemaking Capacity, and Collective Intelligence]
 
= Semantic Mediawiki =
 
* Practical way of adding semantic data to wikis (mediawiki)
* Anyone can add semantic content
* Inline properties (annotations) <nowiki>[[about::Semantic web]]</nowiki>
* Ontologies - classes (categories and templates)
* Queries
** <nowiki>{{ #ask: [[about::Semantic web]] }}</nowiki>
** <nowiki>{{ #ask: [[Category:Person]] | [[Location::Montreal]] }}</nowiki>
* Forms
** Make SMW much easier to use when entering field data
** Inputs for different text types, date, file upload, geo location
* Views
** Map
** [http://subvention.zooid.org/wiki/Quebec_social_orgs_mapped Facet view]
** [http://innovationcell.com/wiki/Work Timeline]
** [http://innovationcell.com/wiki/User:Carlos_Rizo#tab=Graph Graph]
 
== Hands on ==
 
* Go to http://subvention.zooid.org
* Create an account
* Create page using search
* Add property <nowiki>[[knows::Douglas Jack]]</nowiki>
* Add property <nowiki>[[birth date::June 1, 1970]]</nowiki>
* Add link <nowiki>[[People]]</nowiki>
* Add <nowiki>[[Category:Person]]</nowiki>
* View resulting [http://subvention.zooid.org/wiki/People collective People page].
 
* Add a new class (category with properties, form)


Presentation: [[Aug 2010 Semantic Mediawiki intro]]
== Sites ==


'''If you'd like to attend, contact me by Friday the 20th (just add your name to the page (not disqussion) and I'll be notified).'''
* [http://discoursedb.org/wiki/Park51_controversy Discourse DB]
* [http://www.placeography.org/index.php/Main_Page Placeography]


{{iCal|{{PAGENAME}}|August 26, 2010 09:00|August 26, 2010 12:00|McGill Harrington Building}}
= Licensing =


http://tram.mcgill.ca/Contact_us/contactus_2.html
* Creative commons designed to make content clearly reusable
** CC-BY
** No derivatives, no commercial


{{Blikied|Aug 14, 2010}}
[[Category:SemWeb]]

Revision as of 17:06, 20 June 2010

A full week of learning GATE text mining/information extraction language processing and talks. Session wiki

GATE developer screenshot

GATE is written in Java and very Java centric. This makes it portable, fast, and heavyweight. A programming library is available. It's 14 years old and has many users and contributors.

Using GATE developer

  • GATE developer is used to process sets of Language Resources in Corpus using Processing Resources. They are typically saved to a serialized Datastore.
  • ANNIE, VG (verb group) processors.
  • Preserve formatting embeds tags in HTML or XML.
    • Different strengths using GATE's graph (node/offset) based XML vs. preserved formatting (original xml/html)

Information Extraction

  • IR - retrieve docs
  • IE - retreive structured data
  • Knowledge Engineering - rule based
  • Learning Systems - statistical

Old Bailey IE project - old english (Online)

  • POS - assigned in Token (noun, verb, etc)
  • Gazateer - gotcha, have to set initialization parameter listsURL before it's

loaded. Must also "save and reinitialize."

  • Gazeteer creates Lookups, then transducer creaties named entities
  • Then orthomatcher (spelling features in common) coreference associates those
  • Annotation Key sets and annotation comparing
    • Need setToKeep key in Document Reset for any pre annotated texts

Evaluation / Metrics

  • Evaluation metric - mathmatically against human annotated
  • Scoring - performance measures for annotation types
  • Result types - Correct, missing, spurious, partially correct (overlapped)
  • Tools > Annotations Diff - comparing human vs machine annotation
  • Corpus > Corpus quality assurance - compare by type
  • ( B has to be generated set
  • Annotation set transfer (in tools) - transfer between docs in pipeline
    • useful for eg html that has boilerplate

To investigate

  • markupAware for HTML/XML (keeps tags in editor)
  • AnnotationStack
  • Advanced Options



RSS

Blikied on Aug 30, 2010