Computational Processing of the Portuguese Language: 7th by Thiago Alexandre Salgueiro Pardo, Lucas Antiqueira, Maria

By Thiago Alexandre Salgueiro Pardo, Lucas Antiqueira, Maria das Graças Volpe Nunes (auth.), Renata Vieira, Paulo Quaresma, Maria das Graças Volpe Nunes, Nuno J. Mamede, Cláudia Oliveira, Maria Carmelita Dias (eds.)

Since 1993, PROPOR Workshops became a tremendous discussion board for - searchers inquisitive about the Computational Processing of Portuguese,both written and spoken. This PROPOR Workshop follows past workshops held in 1993 (Lisbon, Portugal), 1996 (Curitiba, Brazil), 1998 (Porto Alegre, Brazil), 1999 ´ (Evora, Portugal), 2000 (Atibaia, Brazil) and 2003 (Faro, Portugal). The wo- store has more and more contributed to bringing jointly researchers and companions from either side of the Atlantic. The structure of a world software Committee and the adoption of high-standard referee tactics display the regular improvement of the ?eld and of its scienti?c neighborhood. In 2006 PROPOR bought fifty six paper submissions from eleven di?erent international locations: Brazil, Portugal, Spain, Norway, united states, Italy, Japan, France, Canada, Denmark and the united kingdom, from which nine are represented within the accredited papers. each one submitted paper underwent a cautious, triple-blind assessment by means of the P- gram Committee. All those that contributed are pointed out within the following pages. The reviewing technique ended in the choice of 20 ordinary papers for oral presentation and 17 brief papers for poster sections, that are released during this quantity. The workshop and this booklet have been based round the following major t- ics, seven for complete papers: (i) automated summarization; (ii) assets; (iii) au- matic translation; (iv) named entity acceptance; (v) instruments and frameworks; (vi) platforms and types; and one other ?ve subject matters for brief papers; (vii) info extraction; (viii) speech processing; (ix) lexicon; (x) morpho-syntactic reports; (xi) net, corpus and evaluation.

Some entities, such as companies and other organizations, may have certain typical end-words or acronyms. A”. Searches were performed using simple Perl scripts. Each complete run took approximately two hours, including manual revision, and we were usually able to extract up to 1000 instances of NE per run (sometimes many more). 2 Retrieving Instances from the Web For some specific NE categories we found that it was much easier to find domain specific sites and collect some of the published information.

The classification of each name as first or last was then manually verified and (eventually) corrected, and first names were given their gender values. 24 J. Baptista, F. Batista, and N. Mamede List2, containing approximately 8,100 complete person names, was first processed manually, by establishing the border between first (= given) names and last names (= surnames). This first step allowed us to unequivocally determine the frequency of use of each word as first or/and last name. To all new names, that is names not yet present in the first list, gender, as well as other information was added, in particular, indication of foreign names: Yuri, and orthographic variants: Melo vs.

Of the XVII Brazilian Symposium on Artificial Intelligence - SBIA2004, S˜ ao Lu´ıs, Maranh˜ ao, Brazil (2004) 235–244 23. : SUPOR: Um Ambiente para a Explorac˜ ao de M´etodos Extrativos para a Sumarizac˜ ao Autom´ atica de Textos em Portuguˆes. PhD thesis, Computing Department - UFSCar, S˜ ao Carlos, S˜ ao Paulo, Brazil (2003) 24. : Automatic Text Summarization Using a Machine Learning Approach. In: Proc. of the 16th Brazilian Symposium on Artificial Intelligence, Porto de Galinhas, Pernambuco, Brazil (2002) 205–215 Building a Dictionary of Anthroponyms Jorge Baptista1,2, Fernando Batista1,3, and Nuno Mamede1,4 1 L2F – Laboratório de Sistemas de Língua Falada - INESC ID Lisboa, R.

