Elmord's Magic Valley

Software, lingüística e rock'n'roll. Sometimes in English.

Impressions on R7RS-small: libraries, records, exceptions

2017-10-03 15:10 -0300. Tags: comp, prog, lisp, scheme, in-english

In the last post, I wrote a little bit about the historical context in which R7RS came about. In this post, I will comment on my impressions about specific features of the R7RS-small language.

First of all, I'd like to note that if you are going to read the R7RS-small report, you should also read the unofficial errata. As I read the document I spotted a few other errors not mentioned in the errata, but unfortunately I did not keep notes as I was reading. I'm not sure why a, um, revised version of the report is not published with the known errors corrected (a Revised7.01 Report?), but alas, it isn't, so that's something to keep in mind.

In this post, I will talk mainly about the differences between R5RS and R7RS-small, since R7RS-small is more of an incremental extension of R5RS, rather than R6RS. This is not intended as a complete or exhaustive description of each feature; for that, consult the report.

Libraries/modules

R7RS introduced the concept of libraries (what some other systems call modules; I suppose R6RS and R7RS chose the name "library" to avoid conflict with the concept of modules in existing implementations). Library names are lists of symbols and non-negative integers, such as (scheme base), or (srfi 69). A library has a set of imports, a set of exported symbols, and code that constitutes the library. A library definition looks like this:

(define-library (foo bar)
  (import (scheme base)
          (scheme write))
  (export hello)
  (begin
    (define (hello)
      (display "Hello, world!\n"))))

Instead of putting the library code directly in the define-library form (inside a begin clause), it is also possible to include code from a different file with the (include "filename") directive (or include-ci for parsing the file case-insensitively; standard Scheme was case-insensitive until R5RS (inclusive)). This makes it easier to package R5RS code as an R7RS library, by including the older code within a library declaration. It's also a way to avoid writing all library code with two levels of indentation.

Imports can be specified or modified in a number of ways:

These forms can be combined (e.g., you can import only some identifiers and add a prefix to them).

Exports just list all identifiers to be exported, but you can also write (rename internal-name exported-name) to export identifiers with a different name than they have within the library body.

Unlike R6RS, all library code directly embedded in the define-library form must be written within a begin clause. At first I found this kinda weird, but it has an interesting consequence: the library definition sublanguage does not have to know anything about the rest of the programming language. There is only a limited number of kinds of subforms that can appear within define-library, and parsing it does not require knowing about the values of any identifiers. This means that define-library can be more easily processed as data. One can imagine useful tools which read library definitions from files and, say, compute the dependencies of a program, among other possibilities.

In fact, R7RS does not classify define-library or its subforms as syntax forms, i.e., they are something apart from Scheme expressions. This also resolves a problem that would occur if define-library were an expression. The report specifies that the initial environment of a program is empty. So, how would I use import declarations before importing the library where import declaration syntax is defined? Of course one way around this would be to make (scheme base) available by default rather than start with the empty environment. But the solution adopted by R7RS means we don't have to import (scheme base) if we don't want to (for example, if we want to import (scheme r5rs) instead to package R5RS code as an R7RS library). (The report does define for convenience some procedures and syntax forms with the same name as corresponding library subforms, e.g., include.)

R7RS also standardized cond-expand (extended from SRFI 0). cond-expand is a mechanism somewhat like ifdefs in C for conditionally including code depending on whether the implementation defines specific feature symbols, or whether some library is available. This makes it possible to provide different implementations of a procedure (or a whole library) depending on the current implementation. One way we could use it is to write shims, or compatibility layer libraries to provide an uniform interface for features that are implemented differently by various implementations. For example, in Common Lisp, many implemenetation support threads, but they provide different interfaces. Bordeaux Threads is a library which provides a uniform API and maps those to the corresponding functions in each implementation it supports. Now we can do similar things in R7RS for those features that are supported everywhere but in incompatible ways (e.g., for networking).

Libraries and cond-expand are by far the most important addition in R7RS relative to R5RS. Even if we did not have any of the other features, we could package them as libraries and provide implementation-specific code for them via cond-expand.

Missing things

The report does not specify a mapping between library names and file names. I realize that it would be kinda hard to make everyone agree on this, but it is somewhat of a hurdle in distributing programs organized into libraries. Some implementations, such as Chibi, will look up a library named (foo bar) in a file named foo/bar.sld (where .sld stands for Scheme library definition), whereas CHICKEN will look it up at foo.bar.*. There is a project of a portable package manager for R7RS called Snow, which I think takes care of mapping packaged library files to implementation-specific names, but I haven't taken the time to check it out yet.

R7RS takes the excellent step of specifying that library names whose first component is the symbol srfi are reserved for implementing SRFIs, but then fails to specify how to name a specific SRFI. In practice, the few implementations I checked all agree on using (srfi n) as the name of the library implementing SRFI number n (i.e., I can write (import (srfi 69)) and remain reasonably portable), so this may turn out not to be a problem in practice.

Records

R7RS incorporates the define-record-type form from SRFI 9, for defining new record/struct types. It is a somewhat verbose form, which requires specifying the record constructor arguments and the names for each field accessor/getter and (optional) mutator/setter, but it's basically the least common denominator that any implementation which has some form of records can easily support. It looks like this:

(define-record-type <person> (make-person name age) person?
  (name person-name person-name-set!)
  (age person-age person-age-set!))

Here:

R5RS did not have any way to define new types disjunct from existing types. R6RS provides a more complex records facility, including both a syntactic and a procedural layer allowing reflection, but I don't know it well enough to comment. (I have read some comments on problems in the interaction between syntactically- and procedually-defined records, but I don't know the nature of the problems or how serious they are.)

Missing things

Reflection would be nice, or at least a way to convert a record into a vector or something like this (though I realize this might displease some people), but we could make libraries for that. Another thing that would be nice is for records to have a standard printed representation which could be printed out and read back again, but I realize there is a slightly complicated interaction here with the module system (the printed representation should be tagged with the record type in a way that will work regardless of which module it is read back in), and this might not even be desirable for implementation-internal types which happen to be defined in terms of define-record-type.

Exceptions

R7RS incorporates the exception handling mechanisms from R6RS, but not the condition types. Any value can be raised in an exception. The raise procedure raises a value as an exception object, or condition, to be caught by an exception handler. The guard form can be used to install an exception handler to be active during the evaluation of its body. The guard form names a variable to hold the captured condition, a sequence of cond-like clauses to determine what action to take given the condition, and a body to be executed. It looks like this:

(guard (err
        ((file-error? err) (display "Error opening file!\n"))
        ((read-error? err) (display "Error reading file!\n"))
        (else (display "Some other error\n")))
  (call-with-input-file "/etc/passwd"
    (lambda (file)
      (read-line file))))

If an else clause is not provided and no other clause of the guard form matches, the exception propagates up the stack until some handler catches it. If an exception is raised and caught by a guard clause, the value returned by the guard form is whatever is returned by the body of that clause.

Beside raise, R7RS also defines a procedure (error message irritants...), which raises an error object (satisfying the error-object? predicate) encapsulating an error message and a sequence of objects somehow related to the error (called "irritants"). It also defines the procedures error-object-mesage and error-object-irritants to extract the components of the error object.

R7RS does not define specific object types to represent errors; it only says that objects satisfying a given predicate must be raised in some circumstances. An implementation might define a record type for that, or just use lists where the first element represents the error type, or whatever is appropriate for that implementation.

At first I did not think exceptions were that important in the grand scheme of things (heh), since you can implement them on the top of continuations. (And indeed, exceptions in R6RS are in a separate library rather than the base language, although this does not mean much in R6RS because, if I understand correctly, all libraries are mandatory for R6RS implementations.) However, I then realized that until R5RS (inclusive), there was no standard way to signal an error in Scheme code, and perhaps more importantly, no standard way of catching errors. If portable libraries are to become more prominent, we will need a standard way of signalling and catching errors across code from different projects, so exceptions are a good add-on.

Beside raise, R7RS also defines raise-continuable, which raises an exception but, if the guard exception handler returns, it returns to the point where the exception was raised rather than exiting from the guard handler form. [Correction: this is how raise-continuable interacts with with-exception-handler, not guard. I'm still figuring how guard acts with respect to continuable exceptions.] On the top of this, something like Common Lisp's restarts can be implemented.

One side effect of having guard in the language is that now you can do control flow escapes without using call-with-current-continuation (call/cc for short). In theory this could be more efficient than capturing the fully general continuation just to escape from it once; in practice, some implementations may rely on call/cc to implement guard (the example implementation provided in the report does), so this performance advantage may not realize. But just having a construct to express a one-shot escape is already great, because it allows expressing this intent in the program, and potentially allows implementations to emit more efficient code when a full continuation is not required.

I was wondering if one could implement unwind-protect (a.k.a. try/finally) in terms of guard, and so avoid dynamic-wind for error-handling situations. Alas, I don't think this is possible in general, because the presence of raise-continuable means an error handler may execute even though control may still return to the guard body. I wish to write more about continuations in a future post.

Conclusion

Libraries (plus cond-expand), records and exceptions are the most important additions in R7RS-small relative to R5RS, and they are all a great step towards enabling code reuse and portability across implementations, while not constraining Scheme implementors unnecessarily. I am particularly happy about libraries and cond-expand, because this means we can start writing compatibility libraries to bridge differences between implementations without having to rely on a standardization process.

I have some other comments to make on I/O, bytevectors, and other parts of the standard library, but they can wait for a future post.

1 comentário

R5RS, R6RS, R7RS

2017-10-01 22:11 -0300. Tags: comp, prog, lisp, scheme, in-english

Over the last few days I have skimmed over R7RS, the Revised⁷ Report on [the Algorithmic Language] Scheme. I thought I'd write up some of my impressions about it, but I decided first to write a bit about the history and the context in which R7RS came about and the differing opinions in the Scheme community about R6RS and R7RS. In a future post, I intend to write up about my impressions of specific features of the standard itself.

The Scheme language was first described in a document named the "Report on the Algorithmic Language Scheme". Afterwards, a second version, called the "Revised Report on the Algorithmic Language Scheme", came out. The following version of the standard was called the "Revised Revised Report …", or "Revised² Report …" for short. Subsequent versions have kept this naming tradition, and the abbreviation RnRS (for some n) is used to refer to each version of the standard.

Up to (and including) R5RS, all versions of the standard were ratified only by unanimous approval of the Scheme Steering Committee. As a result, each iteration of the standard was a conservative extension of the previous version. R5RS defines a very small language: the whole document is just 50 pages. The defined language is powerful and elegant, but it lacks many functions that are typically expected from the standard library of a modern language and necessary for many practical applications. As a result, each Scheme implementation extended the standard in various ways to provide those features, but they did so in incompatible ways with each other, which made it difficult to write programs portable across implementations.

To amend this situation a bit, the Scheme community came up with the Scheme Requests for Implementation (SRFI) process. SRFIs are somewhat like RFCs (or vaguely like Python's PEPs): they are a way to propose new individual features that can be adopted by the various implementations, in a way orthogonal to the RnRS standardization process. A large number of SRFIs have been proposed and approved, and some are more or less widely supported by the various implementations.

R6RS attempted to address the portability problem by defining a larger language than the previous reports. As part of this effort, the Steering Committee broke up with the tradition of requiring unanimous approval for ratification, instead requring a 60% majority of approval votes. R6RS defines a much larger language than R5RS. The report was split into a 90 page report on the language plus a 71 page report on standard libraries (plus non-normative appendices and a rationale document). The report was ratified with 67 yes votes (65.7%) and 35 no votes (34.3%).

The new report caused mixed feelings in the community. Some people welcomed the new standard, which defined a larger and more useful language than the minimalistic R5RS. Others felt that the report speficied too much, reinvented features in ways incompatible with existing SRFIs, and set some things in stone too prematurely, among other issues.

In response to this divide, the Scheme Steering Committee decided to split the standard into a small language, more in line with the minimalistic R5RS tradition, and a large language, intended to provide, well, a larger language standardizing a larger number of useful features. The R7RS-small report was completed in 2013. The R7RS-large process is still ongoing, being developed in a more incremental way rather than as one big thing to be designed at once.

I think that the R6RS/R7RS divide in part reflects not only differing views on the nature of the Scheme language, but also differing views on the role of the RnRS standards, the Steering Committee, and the SRFI process. In a discussion I read these days, a person was arguing that R6RS was a more useful standard to them, because for most practical applications they needed hashtables, which R6RS standardized but R7RS did not. My first thought was "why should hashtables be included in the standard, if they are already provided by SRFI 69?". This person probably does not consider SRFIs to be enough to standardize a feature; if something is to be portable across implementations, it should go in the RnRS standard. In my (current) view, the RnRS standard should be kept small, and SRFIs are the place to propose non-essential extensions to the language. My view may be colored by the fact that I started using Scheme "for real" with CHICKEN, an implementation which not only supports a large number of SRFIs, but embraces SRFIs as the way various features are provided. For example, whereas many implementations provide SRFI 69 alongside their own hashtable functions, CHICKEN provides SRFI 69 as the one way of using hashtables. So, CHICKEN users may be more used to regard SRFIs as a natural place to get language extensions from, whereas users of some other implementations may see SRFIs as something more abstract and less directly useful.

I have already expressed my view on Scheme's minimalism here, so it's probably no surprise that I like R7RS better than R6RS. I don't necessarily think R6RS is a bad language per se (and I still have to stop and read the whole R6RS report some day), I just have a preference for the standardized RnRS language to be kept small. (I'm okay with a larger standard a la R7RS-large, as long as it remains separate from the small language standard, or at least that the components of the large language remain separate and optional.) I also don't like every feature of R7RS-small, but overall I'm pretty satisfied with it.

Comentários

On Scheme's minimalism

2017-09-14 19:34 -0300. Tags: comp, prog, lisp, scheme, pldesign, ramble, in-english

[This post started as a toot, but grew slightly larger than 500 characters.]

I just realized something about Scheme.

There are dozens, maybe hundreds, of Scheme implementations out there. It's probably one of the languages with the largest number of implementations. People write Schemes for fun, and/or to learn more about language implementations, or whatever. The thing is, if Scheme did not exist, those people would probably still be writing small Lisps, they would just not be Scheme. The fact that Scheme is so minimal means that the jump from implementing an ad-hoc small Lisp to implementing Scheme is not that much (continuations notwithstanding). So even though Scheme is so minimal that almost everything beyond the basics is different in each implementation, if there were not Scheme, those Lisps would probably still exist and not have even that core in common. From this perspective, Scheme's minimalism is its strength, and possibly one of the reasons it's still relevant today and not some forgotten Lisp dialect from the 1970s. It's also maybe one of the reasons R6RS, which departed from the minimalist philosophy, was so contentious.

Plus, that core is pretty powerful and well-designed. It spares each Lisp implementor from part of the work of designing a new language, by providing a solid basis (lexical scoping, proper closures, hygienic macros, etc.) from which to grow a Lisp. I'm not one hundred percent sold on the idea of first class continuations and multiple values as part of this core*, and I'm not arguing that every new Lisp created should be based on Scheme, but even if you are going to depart from that core, the core itself is a good starting point to depart from.

[* Though much of the async/coroutine stuff that is appearing in modern languages can be implemented on the top of continuations, so maybe their placement in that core is warranted.]

1 comentário

Trivia etymologica #3: -ães, -ãos, -ões

2017-08-17 13:29 -0300. Tags: lang, etymology

Em português, palavras terminadas em -ão têm três possíveis terminações de plural diferentes: -ães (cão → cães), -ãos (mão → mãos) e -ões (leão → leões).

Como é que pode isso?

Esse é mais um caso em que olhar para o espanhol pode nos dar uma idéia melhor do que está acontecendo. Vamos pegar algumas palavras representativas de cada grupo e ver como se comportam os equivalentes em espanhol:

Singular Plural
Português Espanhol Português Espanhol
-ães cão can cães canes
capitão capitán capitães capitanes
alemão alemán alemães alemanes
-ãos mão mano mãos manos
irmão hermano irmãos hermanos
cidadão ciudadano cidadãos ciudadanos
-ões leão león leões leones
nação nación nações naciones
leão león leões leones

O que nós observamos é que em espanhol, cada grupo de palavras tem uma terminação diferente tanto no singular (-án, -ano, -ón) quanto no plural (-anes, -anos, -ones), enquanto em português, todas têm a mesma terminação no singular (-ão), mas terminações distintas no plural (-ães, -ãos, -ões). A relação entre as terminações no plural em espanhol* e português é bem direta: o português perdeu o n entre vogais (como já vimos acontecer no episódio anterior), mas antes de sair de cena o n nasalizou a vogal anterior, i.e, anes → ães, anos → ãos, ones → ões.

Já no singular, as três terminações se fundiram em português. O que provavelmente aconteceu (e eu preciso arranjar mais material sobre a história fonológica do português) é que o mesmo processo de perda do n + nasalização ocorreu com as terminações do singular (i.e., algo como can → cã, mano → mão, leon → leõ), e com o tempo essas terminações se "normalizaram" na terminação mais comum -ão. Como conseqüência, o singular perdeu a distinção entre as três terminações, enquanto o plural segue com três terminações distintas.

_____

* Tecnicamente eu não deveria estar partindo das formas do espanhol para chegar nas do português, mas sim partir da língua mãe de ambas, i.e., o dialeto de latim vulgar falado na Península Ibérica. Porém, nesse caso em particular o espanhol preserva essencialmente as mesmas terminações da língua mãe, então não há problema em derivar diretamente as formas do português a partir das do espanhol.

3 comentários

Some thoughts on Twitter and Mastodon

2017-06-25 01:32 -0300. Tags: comp, web, privacy, freedom, life, mind, in-english

Since the last post I've been using Mastodon as my primary microblogging platform for posting, but I was still regularly reading and retweeting stuff on Twitter. A while ago Twitter started reordering tweets in my timeline despite my having disabled that option, just as I said could eventually happen (except much earlier than I expected). The option is still there and is still disabled, it's just being ignored.

Twitter brought me much rejoicing during the years I used it. I follow a lot of cool people there and I've had lots of nice interactions there. I found myself asking if I should accept some abuse from Twitter to keep interacting with those people, and I've been shocked at myself for even asking myself that. I've been using Twitter less and less as of late. (I'd like to be able to say I did it out of principles, but to be completely truthful I find the non-chronological timeline utterly annoying, and that has had as much to do with my leaving as principles.)

Although I switched to Mastodon as my Twitter replacement, Mastodon is not really "another Twitter". Having 500 rather than 140 characters to write initially felt like learning to talk again. Early on when I started using Mastodon, I was going to reply to a person's toot (that's what posts are called in Mastodon) with a short, not-really-one-full-sentence line that is the norm in Twitter. I wrote it down and was like "no, this kind of grunting a half-thought is not going to cut it here". It felt like Twitter's 140 character limit not only limited the kinds of things you could say, but also imposed/favored a "140-character mindset" of not finishing lines of thought or thinking with much depth. As I went on using Mastodon, I found myself writing thoughts I wouldn't have even tried to write in Twitter.

I still open up Twitter once in a while. Today I opened the mobile version in my desktop browser and noticed that the mobile version still shows a chronological timeline, still doesn't pollute the timeline with liked-but-not-retweeted tweets, and is much faster and cleaner than the desktop version. (I still have to un-fix the navigation bar via CSS, but I already had to do that in the desktop version anyway.) It's tempting to start using Twitter again through the mobile version, while it doesn't catch up with the new "features". I know I shouldn't, though. Even if the mobile version never caught up with the misfeatures (I suppose it eventually will, probably in short time), Twitter has already shown they're willing to throw stuff down their users' throats in the name of – what? I'm not even sure. Maybe they want to make Twitter more Facebook-like to attract Facebook users, even if that means alienating the people who used Twitter exactly because it was not like Facebook?

The funny thing is Twitter could simply provide some options for users to control their experience ("(don't) show tweets liked by your followers", "(don't) show tweets you liked to your followers", "(don't) reorder tweets" (the last one is already there, it just doesn't work)). This way they could cater to whatever new audience they have in mind and keep the users who liked how Twitter used to work. They just don't care to. I'm not really sure what are the motivations and goals behind Twitter's actions. For a really long time before the last changes it had been showing the "you might like" box (even if you clicked the "show me less like this" option (the only way to dismiss it) every time) and the "you might like to follow" box (even if you dismissed that too, and even though it also showed undimissable follow suggestions on the right pane anyway). I used to open Twitter pretty much every day, so it didn't really make sense as a user retention strategy. Maybe they want to incentivize people to do specific things on Twitter, e.g., throw in more data about themselves? (Yeah, there was the "add your birthday to your profile" periodic thing too.)

Meh.

3 comentários

Why the new like-based Twitter timeline is terrible

2017-05-31 21:56 -0300. Tags: comp, web, privacy, in-english

Recently, tweets which people I follow 'liked' (but not retweeted) started showing up in my Twitter timeline. Twitter had been showing the "you might like" box with such tweets for quite a long time, but they were separate from normal tweets, and you could dismiss the box (it would come back again after a while, though). Now those 'liked' tweets are showing up intermingled with the normal tweets, and there is no option to disable this.

Now, Twitter has been working hard on its timeline algorithms lately, and, at least initially, the liked tweets it added to my timeline were indeed stuff I liked, and they constituted just a small part of total tweets. That's not the case anymore: now liked tweets seem to be about one third of all tweets I see, and a smaller proportion of them interest me. Moreover, I simply don't want to see that many tweets. If I'm seeing all tweets I used to see plus liked tweets, and liked tweets comprise about a third of all tweets I see now, then I'm seeing about 50% more tweets, and I simply don't have the patience for so much tweetering; I already limit the number of people I follow so as to keep my timeline manageable.

But even if Twitter's algorithms were perfect and showed me only things I wanted to see in an ideal quantity, showing liked-but-not-retweeted tweets would already be bad, for a number of reasons:

As Twitter keeps trying brave new ways of monetizing its users, it's probably going to become more problematic from a privacy perspective. Meanwhile, we now have a quite viable decentralized, free, and usable social network (and I'm already there). The sad thing is that most of the people I follow will probably not migrate from Twitter, but as Twitter keeps getting worse in matters of privacy, transparency and usability, I'm becoming more inclined to leave it, as I have done before.

Update (2017-06-13): Today my Twitter timeline showed up out of order, at least temporarily, even though the "Show me the best Tweets first" options still appears disabled. That one came quick.

1 comentário

Trivia etymologica #2: cheio, full

2017-03-13 00:17 -0300. Tags: lang, etymology

Seguindo o tema de fartura do último episódio, começaremos nosso passeio etimológico de hoje com a palavra cheio. Cheio vem do latim plenus. É uma mudança fonética e tanto, então vamos por partes.

O meio

Cheio em espanhol é lleno. Olhar para o espanhol freqüentemente ajuda a entender a etimologia de uma palavra em português, porque o espanhol conserva alguns sons que se perderam em português, e vice-versa. Neste caso, o espanhol conservou o n de plenus que o português perdeu. O português tem uma séria mania de perder os sons l e n entre vogais do latim. Exemplos:

Quando as vogais antes e depois do l ou n deletado são iguais, o português junta as duas numa só:

E entre certos encontros de vogais (aparentemente entre ea e entre eo), o português prefere inserir um i (produzindo eia, eio):

Por outro lado, quando o l ou n era geminado em latim, isto é, ll e nn, ele é mantido como l e n em português. Já em espanhol, o som resultante do ll do latim continua sendo escrito como ll, mas é pronunciado como o lh do português (com variações dependendo do dialeto), e o nn do latim se torna o ñ do espanhol (pronunciado aproximadamente como o nh do português). Exemplos:

O início

A outra diferença entre cheio e lleno é o som inicial. Nesse caso, nem o português nem o espanhol conservaram o som original do latim, mas pl e cl quase sempre viram ch em português e ll em espanhol. Outros exemplos:

Provavelmente havia alguma peculiaridade na pronúncia desses encontros em latim (ou pelo menos em alguns dialetos de latim), porque eles sofrem modificações em diversas outras línguas românicas também; em particular, em italiano as palavras acima são chiave (o ch tem som de k), chiamare, pioggia e piorare. No geral, o que se observa é uma palatalização desses encontros consonantais: a pronúncia deles tendeu a evoluir para um ponto de articulação mais próximo do palato (céu da boca). Meu palpite é que o l nesses encontros tinha um som mais próximo do nosso lh nesses dialetos (i.e., /klʲavis, plʲuvia/ (algo como clhávis, plhúvia)).

Surpreendentemente, o francês, que costuma ser uma festa de mudanças fonéticas, mantém ambos os encontros intactos (clef, clamer, pluie, pleurer).

Mudanças regulares

Ok. Como vimos, plenus do latim se torna cheio em português. O que vimos são duas mudanças fonéticas regulares na história do português: uma mudança de pl inicial para ch, e uma perda de n entre vogais. No geral, as mudanças de pronúncia que ocorrem ao longo da história de uma língua têm essa natureza regular: elas afetam certos sons em todas as palavras da língua, sempre que eles ocorrem em determinadas circunstâncias. Claro que como com toda regra, ocasionalmente há exceções, por diversos motivos, mas no geral se espera essa regularidade quando se está formulando uma regra que explique a relação entre duas línguas aparentadas (no caso, latim e português).

Mas espera um pouco, você me diz: o português também possui a palavra pleno (e plenitude), que apresenta tanto o pl inicial quanto o n intervocálico original de plenus. Como é que pode isso?

A resposta nesse caso é que pleno é uma palavra reimportada do latim, depois que esses sound shifts já tinham acontecido. O português (e demais línguas românicas, bem como o inglês) contêm inúmeras palavras desse tipo, tipicamente reintroduzidas a partir da época do Renascimento, quando a cultura clássica grega e romana entrou em alta novamente. O resultado é que freqüentemente o português tem duas versões de uma mesma palavra latina: uma herdada naturalmente do latim, passando pelas mudanças de pronúncia históricas da língua, e uma reimportada diretamente do latim, sem a passagem por essas mudanças. Tipicamente, essas palavras reintroduzidas têm um tom mais formal, enquanto as versões "nativas" têm um tom mais coloquial. Exemplos são cheio e pleno, (lugar) vago e vácuo, chave e clave, paço e palácio. Outras vezes o português tem uma palavra nativa, herdada do latim e passando pelas mudanças fonéticas esperados, e diversas palavras relacionadas importadas posteriormente do latim sem essas mudanças. Exemplos são vida (do latim vita) e vital; chuva (do latim pluvia) e pluvial; mês (do latim mensis) e mensal.

Ok, plenus

Vamos voltar para o latim. Plenus em latim significa – adivinhem só – cheio. Em latim também há um verbo relacionado, pleo, plere, que significa encher. O particípio desse verbo é pletus, que, combinado com a típica variedade de prefixos, nos dá palavras como repleto, completo (e, em inglês, deplete). De in- + plere obtemos implere, que é a origem de encher. De sub- + pleo obtemos supplere, de onde vem suprir (e supply, e suplemento). De com/con- + pleo obtemos complere, de onde vem cumprir (além de completo, e complemento).

Em última instância, pleo vem da raiz indo-européia *pleh₁. Outras palavras dessa mesma raiz são plus (e duplus, triplus, etc.) e polys em grego (de onde vem o nosso prefixo poli-).

Há muitas mil eras eu falei por aqui sobre a Lei de Grimm, que descreve a evolução de alguns sons do proto-indo-europeu nas línguas germânicas. Como visto lá, o p do proto-indo-europeu corresponde ao f nas línguas germânicas. E sure enough, full e fill vêm da mesma raiz *pelh₁ que nos dá pleno e cheio. Viel do alemão também vem da mesma raiz. (Esse v se pronuncia /f/. Curiosamente, o alemão vozeou o /f/ para /v/ antes de vogais na Idade Média e shiftou ele de volta para /f/ alguns séculos depois.)

E a essa altura, já estamos todos cheios do assunto.

1 comentário

Trivia etymologica #1: much vs. mucho

2017-03-05 01:43 -0300. Tags: lang, etymology

Há muitos mil anos, explicando para alguém a diferença entre muy e mucho em espanhol, eu fiz uma analogia com o inglês: muy é como very em inglês, e modifica adjetivos e advérbios, e.g., muy rápido = very fast. Já mucho é como many em inglês, e modifica substantivos: muchas cosas = many things. Na verdade essa explicação é parcialmente correta: muchos no plural é como many, e é usado para substantivos contáveis (como em muchas cosas). Mas mucho no singular é como much, e é usado para substantivos não-contáveis, e.g., mucho dinero = much money.

Na época eu me perguntei: será que much e mucho são cognatas? Apesar da similaridade de som e significado, seria pouco provável que as duas palavras fossem cognatas (a menos que o inglês tivesse importado a palavra do espanhol, o que também seria pouco provável). Isso porque se houvesse uma raiz indo-européia comum para as duas palavras, ela teria passado por mudanças fonéticas diferentes no caminho até o inglês e até o espanhol, e seria pouco provável que o resultado final se parecesse tanto em ambos os branches.

As it turns out, as palavras realmente não são cognatas. much vem do inglês médio muche, muchel, do inglês antigo myċel, miċel. O ċ (som de "tch", ou /tʃ/ em IPA) do Old English é resultado de um shift de /k/ para /tʃ/ diante de /e/ e /i/. Assim, a palavra original era algo como /mikel/, que, em última instância, vem da raiz indo-européia *méǵh₂-, que significa 'grande'. É a mesma raiz de mega em grego, e de magnus em latim. *méǵh₂- com o sufixo *-is, *-yos resulta em latim nas palavras magis, que é a origem da palavra mais em português, e maior, que é a origem de (quem imaginaria?) maior em português. Resumindo, much, mega, mais e maior são todas cognatas.

Mucho, por outro lado, assim como o muito do português, vem do latim multus, que é também a origem do prefixo multi e de palavras como múltiplo. Segundo nosso amigo Wiktionary, multus vem da raiz indo-européia *mel-, e é cognata de melior (de onde vem melhor), que nada mais é do que *mel- com o mesmo sufixo *-yos que transforma mag(nus) em magis e maior.

Por fim, muy é uma contração do espanhol antigo muito, cuja derivação é trivial e sugerida como exercício para o leitor.

1 comentário

elmord.org

2017-02-10 23:52 -0200. Tags: about, comp, web

Então, galere: este blog está em vias de se mudar para elmord.org/blog. Eu ainda estou brincando com as configurações do servidor, mas ele já está no ar, e eu resolvi avisar agora porque com gente usando já tem quem me avise se houver algo errado com o servidor novo.

(Incidentalmente, eu também me mudei fisicamente nas últimas semanas, mas isso é assunto para outro post.)

Mas por quê?

Eu resolvi fazer essa mudança por uma porção de motivos.

O plano qüinqüenal

Eu ainda não sei se essa migração vai ser definitiva (vai depender da estabilidade e performance do servidor novo), então pretendo fazê-la em dois passos:

Servidor novo como réplica do atual. Inicialmente, tanto o elmord.org/blog quanto o inf.ufrgs.br/~vbuaraujo/blog vão ficar servindo o mesmo conteúdo. As páginas do elmord.org ficarão apontando o inf.ufrgs.br como link canônico, i.e., search engines e afins vão ser instruídos a continuar usando a versão no inf.ufrgs.br como a versão "oficial" das páginas. Os comentários de ambas as versões serão sincronizados periodicamente (o que dá para fazer com rsync porque os comentários são arquivos texto).

Redirecionamento para o servidor novo. Se daqui a alguns meses eu estiver suficientemente satisfeito com o funcionamento do servidor novo, ele passa a ser o oficial, as páginas param de incluir o header de link canônico para o blog antigo, e o blog antigo passa a redirecionar para as páginas correspondentes do novo. Se eu não me satisfizer com o servidor novo, eu tiro ele do ar, o inf.ufrgs.br continua funcionando como sempre, e fazemos de conta que nada aconteceu.

EOF

Por enquanto é só. Se vocês encontrarem problemas com o site novo, queiram por favor reportar.

6 comentários

Truques com SSH

2017-01-24 19:40 -0200. Tags: comp, unix, network

Pous, nos últimos tempos eu aprendi algumas coisinhas novas sobre o SSH. Neste post relato algumas delas.

Port forwarding

O SSH possui duas opções, -L e -R, que permitem encaminhar conexões de uma porta local para um host remoto e vice-versa.

Porta local para um host remoto

Imagine que você está na sua máquina local, chamada midgard, e há uma máquina remota, chamada asgard, que é acessível por SSH. Você quer acessar um serviço na pora 8000 da máquina asgard a partir da máquina midgard, mas você quer tunelar o acesso por SSH (seja porque você quer que o acesso seja criptografado, ou porque a porta 8000 simplesmente não é acessivel remotamente). Você pode usar o comando:

midgard$ ssh -L 9000:127.0.0.1:8000 fulano@asgard

O resultado disso é que conexões TCP feitas para sua porta local 9000 serão tuneladas através da conexão com fulano@asgard para o endereço 127.0.0.1, porta 8000 na outra ponta. Por exemplo, se asgard tem um servidor web ouvindo na porta 8000, agora você vai poder abrir um browser em midgard, apontar para http://localhost:9000, e a conexão vai cair na porta 8000 de asgard, tudo tunelado por uma conexão SSH.

Note que o 127.0.0.1 é o endereço de destino do ponto de vista do servidor. Você poderia usar outro endereço para acessar outras máquinas na rede do servidor. Por exemplo, se a máquina vanaheim é acessível a partir de asgard, você poderia rodar:

midgard$ ssh -L 9000:vanaheim:8000 fulano@asgard

e agora todos os acessos à porta TCP 9000 da sua máquina local cairão na porta 8000 de vanaheim, tunelados através da conexão SSH com asgard.

Opcionalmente, você pode especificar um "bind address" antes da porta local, para especificar que apenas a porta 9000 de uma interface de rede específica deve ficar ouvindo por conexões. Por exemplo, você pode usar:

midgard$ ssh -L localhost:9000:vanaheim:8000 fulano@asgard

para dizer que a porta deve escutar apenas conexões da própria máquina. (Por padrão, que interfaces serão usadas é decidido pela opção GatewayPorts do cliente SSH, que defaulta para ouvir apenas na interface local de qualquer forma.) Alternativamente, pode-se passar um bind address vazio (i.e., :9000:vanaheim:8000, sem nada antes do primeiro :), para ouvir em todas as interfaces. Dessa maneira, outras máquinas na sua rede local que acessem a porta 9000 de midgard também terão o acesso tunelado para a porta 8000 de asgard. (* também funciona ao invés da string vazia, mas aí você tem que escapar o * para o shell não tentar expandir.)

Porta remota para a máquina local

Também é possível fazer o contrário: instruir o servidor SSH remoto a redirecionar alguma de suas portas para uma máquina e porta acessível a partir da sua máquina local. Para isso, utiliza-se a opção -R. Por exemplo:

midgard$ ssh -R 8000:localhost:22 fulano@asgard

Isso faz com que a porta 8000 em asgard seja tunelada para a porta 22 da máquina local. Agora, se alguém na máquina asgard acessar a porta 8000 (por exemplo, com ssh -p 8000 beltrano@localhost), a conexão vai cair na sua porta 22 local (e a pessoa terá acesso ao seu servidor SSH local). Você pode usar isso se você está atrás de um firewall ou NAT e a máquina remota é acessível pela Internet, mas a sua máquina local não, e você quer dar acesso a algum serviço da sua máquina local à máquina remota. (Já abordamos isso por aqui antes, mas menciono de novo for completeness.)

Proxy SOCKS

O SSH é capaz de funcionar como um proxy SOCKS. Para isso, utiliza-se a opção -D ("dynamic forwarding"):

midgard$ ssh -D localhost:8000 fulano@asgard

Isso faz com que o SSH ouça como um servidor SOCKS na porta 8000 da máquina local. Conexões recebidas nessa porta serão tuneladas para a máquina asgard, que funcionará como um proxy. Você pode então apontar o proxy SOCKS do seu browser ou outra aplicação para localhost, porta 8000.

Outras opções úteis

-C habilita compressão da conexão. E útil principalmente com conexões lentas (numa rede local, a compressão não compensa muito).

Por padrão, se você usa um dos comandos de redirecionamento de portas acima, o SSH faz o redirecionamento e abre uma sessão de shell comum. Se você quer apenas fazer o redirecionamento, pode usar as opções -N (não executa comando remoto) e -f (vai para background (forks) depois de pedir a senha). As opções podem ser combinadas em um único argumento (e.g., -CNf).

Escapes e comandos especiais

Em uma sessão SSH, a seqüência ENTER ~ é reconhecida como um prefixo de escape para acessar uma série de comandos especiais. Se você digitar ENTER ~ ?, verá uma lista de todos os comandos disponíveis:

Supported escape sequences:
 ~.   - terminate connection (and any multiplexed sessions)
 ~B   - send a BREAK to the remote system
 ~C   - open a command line
 ~R   - request rekey
 ~V/v - decrease/increase verbosity (LogLevel)
 ~^Z  - suspend ssh
 ~#   - list forwarded connections
 ~&   - background ssh (when waiting for connections to terminate)
 ~?   - this message
 ~~   - send the escape character by typing it twice
(Note that escapes are only recognized immediately after newline.)

O comando ENTER ~ C abre um prompt onde é possível fazer e cancelar redirecionamentos de porta, com uma sintaxe análoga à das opções vistas anteriormente:

ENTER ~ C
ssh> ?
Commands:
      -L[bind_address:]port:host:hostport    Request local forward
      -R[bind_address:]port:host:hostport    Request remote forward
      -D[bind_address:]port                  Request dynamic forward
      -KL[bind_address:]port                 Cancel local forward
      -KR[bind_address:]port                 Cancel remote forward
      -KD[bind_address:]port                 Cancel dynamic forward

Pasmem.

Observações

O uso das opções de redirecionamento pode ser controlado/desabilitado na configuração do servidor. Consulte a man page sshd_config(5) para mais informações.

2 comentários

Main menu

Posts recentes

Comentários recentes

Tags

comp (114) prog (51) life (44) unix (32) random (27) lang (27) about (24) mind (22) mundane (21) pldesign (20) in-english (19) lisp (17) web (17) ramble (15) img (13) rant (12) privacy (10) scheme (8) freedom (8) lash (7) music (7) esperanto (7) bash (7) academia (7) home (6) mestrado (6) shell (6) conlang (5) copyright (5) misc (5) worldly (4) book (4) php (4) latex (4) editor (4) politics (4) etymology (3) wrong (3) android (3) film (3) tour-de-scheme (3) kbd (3) c (3) security (3) emacs (3) network (3) poem (2) cook (2) physics (2) comic (2) llvm (2) treta (2) lows (2) audio (1) wm (1) philosophy (1) kindle (1) pointless (1) perl (1)

Elsewhere

Quod vide


Copyright © 2010-2018 Vítor De Araújo
O conteúdo deste blog, a menos que de outra forma especificado, pode ser utilizado segundo os termos da licença Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Powered by Blognir.