Resigning from Debian's Publicity team (sort of), and the status of our public relations

admin Friday January 1, 2016

When I joined Debian, Debian Weekly News was an important tool for me to follow the project. I must have read each issue until, during the Dunc-Tank controversy, I read the following in the introduction of the 2006-09-26 issue:

Debian Weekly News - September 26th, 2006 wrote:

As Debian experiments with funding, the author of DWN is going to experiment with spending less time on Debian. Please understand that due to this there may be no future issues of DWN in the current form or that they will only be released less frequently.

This was horrific not just from a public relations standpoint, but from a manpower standpoint. Having someone replace Joey would be extremely difficult. And indeed, at that point, Debian Weekly News did stop being weekly, before its last issue came out on 2007-07-03.

I cannot blame Joey for what he did. He must have been burnt out at the time, and if it wasn't for the fact that Dunc-Tank was unofficial, it would have been a good reason to leave. I felt compelled to take over from Joey, but that would have been too much work, and I never thought weekly news was a good format anyway.

But when I saw Alexandr Schmehl's call to resurrect DWN inside the Publicity team, I felt it was my duty to help. And that wasn't hard - the first issues had very poor quality, and I quickly started reviewing them.

A lot has happened since. Many people joined the team. Many people left. A few people got quite involved. Unfortunately, no one got involved as much and for as long as Joey did. And the news never regained their former frequency, nor did they approach their former quality. Few contributors officially resigned, but I guess I am the last one from the initial crew to quit. Over time, my involvement expanded to review all public communications.

Publicity team delegated

Fast forward to September 2015. Many of those in charge of Debian's public relations, including myself, were probably surprised to read Updating delegation - Press/Publicity/Bits. There were good news there. Despite the title, this was essentially making the publicity team delegated, which had never been done before. For the first time, the publicity team was officially recognized. Merging the previous teams was perhaps a good idea. But most members of the previous teams were no longer part of the new team, without any explanation. On 2015-10-04, I contacted the DPL asking what this was all about. As the mailing list has been broken for many months, only one of these mails has been archived. I am making these mails public by including them below.

Filipus Klutiero wrote:

> - Team members must word articles in a way that reflects the Debian project's stance on issues rather than their own personal opinion.

This is not an actual task. I do not think delegations should try telling teams how to do their job, and I do not think neutrality should only be requested from the publicity team... if there is such a thing as neutrality.

> - Finally, the Publicity Team is the official Debian contact point for press inquiries and media people outside Debian. When acting in such capacity, members of the team act as a spokesperson of the Debian Project.

I have never been part of the bits team, nor intensively involved in the others. I have no strong opinion on whether merging is a good idea, though it seems reasonable.
That being said, I really think such a cut would be detrimental. I have been focused on project news, but ever since Joey left, "the team" (we did not talk about a team back then) has been understaffed. Removing most members will certainly push quality and timeliness even lower.

Were members removed because you did not want to grant them press privileges? If so, I think this should be canceled, or the team structure should be complexified to support different accesses for different members. I do not think members would be offended. In fact, this was already the case, and I for one have zero desire to get more accesses.

If you confirm your decision, I will let you be the one to officially remove members from the list.

Neil's answer came the next day. The next day, I replied:

Filipus Klutiero wrote:

> Delegations are not only about tasks, and this doesn't tell them how to fulfil the role. It's a limiting on the scope of the delegation though.

I do not see the neutrality item as a limitation of scope. I do see it as a rule on how the role must be or must not be fulfilled.

If there is concern about releases being controversial or otherwise low in quality, what we need is better reviewing, and this has been discussed before, unfortunately without results so far: https://lists.debian.org/debian-publicity/2011/08/msg00020.html

> I should point out that this text was the one proposed by the team itself: https://lists.debian.org/debian-publicity/2015/08/msg00032.html

I recognize I have not been following closely lately, and I could have reacted earlier, but I did not propose that text, and I see no sign on the mailing list that this text was proposed to project leadership.

>> That being said, I really think such a cut would be detrimental. I have been focused on project news, but ever since Joey left, "the team" (we did not talk about a team back then) has been understaffed. Removing most members will certainly push quality and timeliness even lower.
> This isn't to stop anyone doing any work - I believe that the only actual people who I expect to be 'removed' are Joey, and myself.

First, it would be most useful to have a list of delegations. Failing that, I will guess, since I do not remember seeing a publicity delegation, that you are saying this is the first delegation for the publicity team.

I understand your point - the publicity team has never been delegated, so your delegation does not remove delegates.
On the other hand, the publicity team did exist as a de facto team, as other Debian teams start. I have no problem with turning it into a delegation, nor with removing current members which are determined as undesirable, but delegating omitting existing members effectively removes them. If a decision is taken to remove most of the team, an explanation would help everyone ensure there were no alternative solutions to the problem.

If anyone doubts manpower is an issue, just look at these 3 pages I just consulted - they are *all* outdated, more than 2 weeks after the delegation:
https://wiki.debian.org/Teams/Publicity/#Usual_roles
https://wiki.debian.org/Teams/Press
https://www.debian.org/intro/organization.en.html

And if some think email should suffice, notice the delegation does not even show in our team's mail archive.

Neil was then quiet for weeks. On 2015-10-24, I told him:

Filipus Klutiero wrote:

A new DPN issue has been released. While one longstanding problem appears to be finally gone (only 4 years after https://lists.debian.org/debian-publicity/2011/12/msg00019.html ), the reviewing issue is certainly not. And this is despite half of the persons listed as having contributed to that issue not even being part of the team according to the delegation (this issue has taken a long time to release - I have not verified whether their contributions predate the delegation, or if they chose to contribute despite it).

The situation described in my last mail seems to persist. I do not wish to remain associated in any way with the publicity team in its current state when reasonable efforts are not being done to go forward. Insofar as this makes sense given the current context, I intend to offer my resignation if the situation has not evolved in a month.

His last message came on 2015-10-26:

Neil McGovern wrote:

On Sat, Oct 24, 2015 at 10:58:43AM -0400, Filipus Klutiero wrote:
> A new DPN issue has been released. While one longstanding problem appears to be finally gone (only 4 years after https://lists.debian.org/debian-publicity/2011/12/msg00019.html ), the reviewing issue is certainly not. And this is despite half of the persons listed as having contributed to that issue not even being part of the team according to the delegation (this issue has taken a long time to release - I have not verified whether their contributions predate the delegation, or if they chose to contribute despite it).

I believe it's the latter - you keep insisting that the fact that they're not delegated means they've been banned from doing any work.
This is not the case.

> The situation described in my last mail seems to persist. I do not wish to remain associated in any way with the publicity team in its current state when reasonable efforts are not being done to go forward. Insofar as this makes sense given the current context, I intend to offer my resignation if the situation has not evolved in a month.

I intend to make no further efforts here, you are free to make your own choice on where you work.

I immediately asked Neil to explain:

On 2015-10-26, Filipus Klutiero wrote:

On 2015-10-26 11:46, Neil McGovern wrote:
>I believe it's the latter - you keep insisting that the fact that they're not delegated means they've been banned from doing any work.

What? Did I even claim such a thing once?

In any case, I have tried verifying your belief, but gave up due to the team VCS's breakage... which - ahem - predates the team's halving.

More than 2 months later, my questions remain unanswered. And therefore, as I warned I would do, I hereby resign from the Debian Publicity team (or whatever the team is called now, since that delegation apparently also stripped us from an official name).

Where are we?

A few weeks after this, Laura Arjona finally updated the team's wiki page. The page now lists what it calls "DPL-Delegated members". Unfortunately, things are still far from clear. The page now lists "DPL-Delegated members" and then "Current members". Some of the "DPL-delegated members" are not current members. Most of the current members are not DPL-delegated members. Even Neil, which is expected to be removed by his own account, still figures as a current member.

What happened?

According to Ana Guerrero Lopez's mail, this initiative came from at least Cédric Boutillier and herself. I have not worked much with these, but I would doubt Cédric Boutillier would have intentionally hijacked the publicity team. Considering Laura's edits, I would guess this was an unfortunate accident.

If so, by accepting to delegate without fixing and without verifying, the DPL may have simply done an error. Carelessness would be the only thing he could be blamed for. However, by spreading misinformation and disappearing from the discussion, the DPL has now also failed to fix damage he is (at least in large part) responsible for.

Unless we have a cabal trying to slowly perform a discreet coup, those indirectly kicked out can probably consider that they remain in the team. It remains to be determined how they now differ from "DPL-Delegated members".

Conclusion

2 DPN issues in the last 5 months is not much. There are surely problems for redactors. Solving these may help on the quantity side. Unfortunately, since I can count on my fingers the number of times I contributed content to the news, I cannot say what will help the most.

In general though, the team does extremely bad in recruiting and maintaining its members (either redactors or reviewers). I am not sure I have seen a single DPN issue with proper credits (despite a 2011 discussion of the problem). On the reviewing side, of the hundred issues for which I performed a final review, I could not find any issue or any significant error in about 10. This is not as much as I wish, but it seems the situation had improved with years. I hope whoever picks up this task can keep directing their remarks to the redaction side and eventually make reviewing less necessary. I also wish the team knows how to maintain these recruits for as long as I stayed, or longer. And I hope these will finally benefit from reviewing guidelines.

Other general issues having priority are of course fixing the mailing list and the VCS.

Debian has disappointed me in the last years. In a sense, this delegation is a good thing for me, since I probably no longer have the dedication to the project necessary to care about public relations. I am far from being as involved in Debian as I was when I started reviews, but I may keep reading DPN. Thanks in advance to those who stay… and good luck.

Update 2016-08-21

In his A year in the life of a DPL talk about his DPL term, Neil mentioned his delegation (at 17:45). Since it must have been harder to fix its problems, he seems to have found easier to congratulate this "fantastic" team, since it was "incredibly active and the amount of stuff we're getting now is absolutely wonderful"… even though the team had published merely 6 news issues in the previous year rather than the 52 published a full decade earlier.

Update 2018-05-27

Chris Lamb has updated the team's delegation to reflect the departure of Ana Guerrero Lopez and Cédric Boutillier, which leaves 2 people in the team. Even looking at the year before, the (yearly) number of issues had already dropped to 4 😑

Update 2022-07-17

The team has just published a news issue… its first this year. In fact, in the last 3 years, the team which Neil calls "incredibly active" has struggled to publish even 1 issue per year. One thing hasn't changed though: it still uses the term "weekly"!

Congratulations, Neil McGovern. Your carelessness, but most importantly, your pride and unwillingness to show modesty after being told about your mistakes has managed to kill what was left of Debian's public relations - and in doing so, to largely kill Debian itself.

Update 2024-09

Is Debian waking up? According to its new DPL, whoever is left in the "incredibly active" team which published exactly 4 issues of its "weekly" news during the last 5 years now "wants us".🙄

Civilization: Beyond Earth on Debian GNU/Linux? Good luck

admin Saturday December 26, 2015

Ever since I moved to GNU/Linux, the video game I missed the most was Sid Meier's Civilization. The only version ported to GNU/Linux was Sid Meier's Alpha Centauri, probably my favorite version. But that port seemed to be an afterthought. One needed to look for the special installer, which was buggy.

With the release of CivBE, I was under the impression that Firaxis was finally truly making GNU/Linux a supported platform for Civilization. The GNU/Linux version was released less than 2 months after the Microsoft Windows version. Mac/Linux was even the fourth item in the game's official FAQ. For the first time in many years, I put a video game on my wish list. To my surprise, my mother offered it to me this week (I suppose she did not realize it was the same series I spent so many hundreds of hours playing over nearly 2 decades 😛).

I was also happy to see the game's box didn't have the huge Games for Windows banner anymore. Unfortunately, system requirements claimed Windows was necessary. But I thought that was just randomly written system requirements, as usual (how credible are requirements asking for "Windows Vista SP2/ Windows 7" for a Q4 2014 game anyway?). I was less impressed when I inserted the DVD and realized there was absolutely no material for GNU/Linux, nor any documentation explaining where to go. And now, I cannot even find instructions on the Internet. The FAQ item mentioned above still discusses a Linux version as something future (although Wikipedia says it was released 2014-12-18). And I cannot even find installations instructions when searching on Google.

Is Civilization: Beyond Earth beyond Windows? I am far from being convinced at this point.

Hopefully, at least the game will be stable - without serious bugs as those which I experienced playing the original versions of Civilization III and V (let alone serious networking issues with Civilization IV).

Memory usage of Apache's PHP children processes

admin Monday December 14, 2015

I ran a PHP benchmark for which I allowed PHP to take as much memory as it wanted. The benchmark worked, but I then realized Apache was using 2 GB of RAM. The parent process was fine, but it turned out the apache2 child process which had run the benchmark was still using 2 GB (RES).

I thought that was abnormal, but I verified on ##php and eventually had confirmation from several people that - to my great surprise - this is not a memory leak. This behavior is expected. And indeed, I can re-run the same benchmark and it will never run out of memory if it succeeded to reserve enough memory the first time. I am not a sysadmin, but that was still quite a shock. I was told PHP has its own memory manager, and only releases memory if the Apache child is restarted. In reality though, other processes (including Apache children) will manage to "steal" memory reserved by idle children. This is surely the part I find most amazing. I am curious to learn how Linux manages that.

So, the memory Apache grants to PHP children will sometimes only be released when these children processes are restarted, but other processes will manage to reclaim that memory if needed. At the very least in our configuration (Debian 8's PHP 5.6.14 on Apache 2.4.10 with prefork MPM).

One important word above is "sometimes". For some reason, children sometimes immediately release their memory. I initially thought it took 2 executions for memory to stick, but a second execution does not always lock. Which is why I would welcome pointers to discussion of this behavior. It seems memory will not be freed if 2 requests come with little idle time in between (seconds).

The following shows well enough an Apache restart freeing 2 GB of RAM:

Copy to clipboard

root@Daphnis:/var/log/apache2# free -h; grep Mem /proc/meminfo; service apache2 restart; free -h; grep Mem /proc/meminfo
             total       used       free     shared    buffers     cached
Mem:          3,0G       2,4G       660M       9,5M       688K        75M
-/+ buffers/cache:       2,3G       736M
Swap:         713M       276M       437M
MemTotal:        3173424 kB
MemFree:          675824 kB
MemAvailable:     634736 kB
             total       used       free     shared    buffers     cached
Mem:          3,0G       216M       2,8G       9,4M       756K        88M
-/+ buffers/cache:       126M       2,9G
Swap:         713M       270M       443M
MemTotal:        3173424 kB
MemFree:         2951552 kB
MemAvailable:    2917400 kB

Transition to the SI - A matter of numerous Ms-s

admin Saturday December 12, 2015

##php wrote:

(19:32:13) chealer: so if I consider that PHP's 0 ds should be 1 ds, then that proves my understanding that it's not the DB which adds that extra second.
(19:33:33) Literphor: chealer: What is ds? A decisecond?
(19:33:41) chealer: Literphor: yeah. it's all on a 1 Gb/s LAN, but that probably explain the 3 ds difference.
(19:33:45) chealer: Literphor: right
(19:34:03) Literphor: chealer: Heh you’re the first person I’ve ever seen use those units
(19:34:40) chealer: Literphor: Heh, you're not the first person telling me I'm the first person they see use those units.

dig(1) and other DNS clients sometimes taking 5 seconds to return the results of a local query

admin Friday November 27, 2015

After installing a few Debian VMs inside our Windows environment, I noticed very strange performance problems resolving local domain names on local DNS servers this week. Simple queries which should have taken milliseconds would sometimes be very slow. And these slow queries would constantly take 50 deciseconds to resolve - never 49 or less. It looked like a timeout, but logs had no such mentions, and it was hard to tell when the timeouts would occur, except that they would occur more on a first test after I stopped testing for a few minutes. For example, a trivial connection to a local MySQL server could take just above 50 ds to establish:

$ time echo 'SELECT 1;'|mysql -u [...] --password=[...] -h PC-0002
1
1

real 0m5.014s
user 0m0.000s
sys 0m0.004s
pcloutier@deimos:/var/lib/dpkg/info$

This was far from MySQL-specific. dig(1) would suffer from the same delays:

$ time dig @phobos.lan.gvq titan.lan.gvq

; <<>> DiG 9.9.5-9+deb8u3-Debian <<>> @phobos.lan.gvq titan.lan.gvq
; (1 server found)

; global options: +cmd

;; Got answer:

; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15593

; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:

EDNS: version: 0, flags:; udp: 1280

;; QUESTION SECTION:
;titan.lan.gvq. IN A

;; ANSWER SECTION:
titan.lan.gvq. 3600 IN A 10.10.1.29

; Query time: 0 msec

; SERVER: 10.10.1.23#53(10.10.1.23)

; WHEN: Fri Nov 27 12:14:42 EST 2015

; MSG SIZE rcvd: 58

real 0m5.018s
user 0m0.012s
sys 0m0.004s
pcloutier@deimos:/var/lib/dpkg/info$

...where phobos.lan.gvq is a local DNS server, and titan is just a local hostname which is supposed to resolve very quickly. Attentive readers will notice that Query time indicates 0 ms. This is because the DNS query proper does take 0 ms. The delay comes from the resolution of the name server itself, which I specified by name. This cannot be reproduced with dig if the name server is specified by IP.

This turned out to be an IPv6-related glibc issue. The first big advance came from a Stack Exchange thread, which allowed me to confirm that the delay was due to a timeout in glibc's getaddrinfo(3). This can be achieved with high certitude by changing that delay using the resolv.conf timeout option. glibc's default timeout is 5 seconds. For example, if you notice that the delay decreases to 3 s after setting "options timeout:3", then you are clearly experiencing timeouts. If not, sorry, this post will not help you.

The next step was to determine whether that timeout was IPv6-related. This can be achieved by disabling IPv6 on the GNU clients, but it may be simpler to just set options single-request and single-request-reopen. If none of these helped, you know your problem is caused by timeouts, but the cause is different than ours, and the rest of this post will not help.

If disabling IPv6 helped but single-request and single-request-reopen do not, sorry, I do not know more about your issue. But if single-request or single-request-reopen helped, your problem must be similar to ours. Due to a glibc 2.9 change (see section "DNS NSS improvement"), getaddrinfo() often causes a communication issue between itself and the DNS server when querying either IPv4 or IPv6 addresses due to what Ulrich Drepper describes as server-side breakage. Since at least glibc 2.10, if glibc detects that glitch may have happened, it workarounds by re-sending the queries serially rather than in parallel, so the problem "merely" causes a timeout. If there is a firewall between your DNS server and you, see the Stack Exchange thread above. If a firewall issue is excluded and your DNS server is running Windows Server, you are probably experiencing the same incompatibility as ours.

I first thought our Windows Server 2008 [R1] servers were causing this because of an old bug, but according to a 2014 blog post, this still happens with Windows Server 2012 R2. Although the tcpdump shown on the Stack Exchange thread describes pretty well what is going on, I had to perform my own captures to understand why the timeout would only happen sometimes, and succeeded quickly enough. When the problem does not happen, getaddrinfo() queries both A and AAAA (IPv6) records in parallel in packets 7 and 8 and receives both replies in packets 9 and 10:

Capture 1 - no problem

Packets 11 and 12 show the DNS query proper, since this capture shows the full activity for the dig command explained above.

When the problem happens, what was packet 9 in capture 1 is gone. Which is why getaddrinfo() retries 5 seconds later (after the gap between packet 26 and 30), in packets 30 and 32, but now sequentially:

Capture 2 - serial retry after 5 seconds timeout

Why does the problem happen in capture 2? Surely because of that extra color... the beige ARP packets at 24 and 25. In other words, in the first call, the DNS client's IP address is in the DNS server's ARP cache, so the server does not need to resolve the client IP address. In the second case, the DNS clients's ARP cache in the DNS server has expired, so the server needs to perform an ARP query before being able to send what would be packets 9 and 10 in the first case (I would have thought the server could figure out the ARP address from packets 22 and 23, but apparently that is not how that Windows works).

As explained in Microsoft's ARP caching behavior documentation, in recent Windows versions, an ARP cache record is [usually] maintained for a random time between 30 and 90 seconds after the last time it was used. This must be why that bug was pretty hard to track. Therefore, if the server and the client communicate at least each 30 seconds, this timeout should only be experienced once. This means that in the case of Windows Server DNS servers, the behavior would be the same if glibc didn't fallback to serial queries after the timeout.

Causes and solutions

I have not found a server-side workaround (besides, I guess, disabling IPv6). Unfortunately, I believe this needs to be worked around on every GNU client.

It is more interesting to try determining the root cause of this issue and definitive solutions. glibc developers consider it a Windows bug. But would Microsoft leave a bug which must be triggered millions of times per day unfixed for years?

Windows Server

The captures clearly show that glibc starts with the IPv4 query. Which means the Windows server can only send the AAAA reply after it can send the A reply. In general, that must mean it replies to both. But when the server has to wait for an ARP reply before sending its DNS reply, it may have received the AAAA request before it is able to send the A reply. I would need to perform a server-side capture to confirm that, but it could be that Windows detects that situation and decides to send a single reply to save bandwidth and/or favor IPv6 usage. If the goal was simply to favor IPv6, it would probably be better to just send the AAAA reply before the A reply.

Windows may be doing a heuristic optimization by guessing that the client just needs one address, which would certainly be wrong sometimes. This could be considered a bug in so far as failure to reply constitutes a bug.

DNS clients and the protocol

But there is certainly a client-side issue as well at least in this case. The client requests both an IPv4 address and an IPv6 address while it only needs one. Unless this is a strategy to minimize further queries, this is inefficient.

According to this Stack Overflow thread, it is not clear that requesting both A and AAAA records in a single DNS query is possible. And even that would not be the most optimal solution — that is, requesting whatever single IP address should be used.

From getaddrinfo()'s perspective, it cannot be optimized, since the caller has requested any address to be returned. So the problem is really in dig and other DNS clients calling getaddrinfo() just to resolve a hostname. These clients are all suboptimal. gethostbyname() is optimal, but obsolete since it is not compatible with IPv6. There should be a resolving function which either returns the first IP address obtained, or returns both without blocking while waiting for the second. Clearly, each program cannot implement such a function itself. I do not know glibc, but a C library's API should allow such a resolution. If it doesn't, glibc has an issue too.

HTML/CSS - Centering

admin Wednesday November 11, 2015

Centering in CSS is not easy. But each time I must vertically center, I must search the web to convince myself that I have no choice other than using a hack. So I found it comforting to see this admission, coming from the W3C itself:

At this time (2014), a good way to center blocks vertically without using absolute positioning (which may cause overlapping text) is still under discussion.

Windows Firewall dangers - Is your Windows [8] PC's networking broken after you joined a domain?

admin Friday November 6, 2015

I hate firewalls. One of the first things I do on any personal Windows I install since Vista is to disable Windows Firewall. Usually, that's all it takes... plus disabling the maintenance center's firewall monitoring so it stops harassing you about the firewall, of course.

So when I noticed my PC's Apache was no longer reachable from other machines and that it would no longer ping, Windows Firewall did not come to my mind as an obvious suspect. Only after I realized that the problem started shortly after I joined the install to the entreprise domain did I start suspecting that some GPO was now forcing the firewall. Of course, I then went to check the firewall's status, using the maintenance center. In order to check its status, I clicked Turn on messages about network Firewall. The maintenance centre then displayed:

In English: "The Windows Firewall is disabled or configured incorrectly."
I was quite sure the firewall wasn't configured incorrectly, since the only configuration I had done was to disable it, so I assumed the firewall was disabled and proceeded to waste at least 10 minutes in further troubleshooting before finally realizing that the damn firewall was actually enabled... despite the button offering me to "Enable now".

In the end, this had nothing to do with Group Policy. The problem is you can't even directly turn off the firewall completely; you have to disable for every network type: private, public and - when you're on a domain - domain networks, which wasn't done on my install. So I clicked Disable Windows Firewall, closed the window, and proceeded to verify that the network was working again - which, of course, was not the case. After trying to reset the network card without success, I went back to the panel to notice that my change hadn't taken effect. Great, so for that specific panel, your changes are discarded without warning unless you select OK.

Conclusion

If your Windows machine's networking stopped working after joining a domain and won't even send ICMP replies, do verify Windows Firewall, and do so by going to the configuration panel and to the Windows Firewall panel. And if you need to disable it, select OK.

Addendum

After more issues with Windows Firewall, I dedicated it a new post.

Debian KDE - A natural choice?

admin Sunday October 25, 2015

Testing migration summary 2015-10-22 wrote:

[...]
caribou 0.4.19-1 0.4.18.1-1
celery 3.1.18-3 3.1.18-2
cervisia 4:15.08.2-1 4:15.08.1-1
django-nose 1.4.2-1 1.4.1-1
dolphin 4:15.08.2-1 4:15.08.1-1
dolphin-plugins 4:15.08.2-1 4:15.08.1-1
dragon 4:15.08.2-1 4:15.08.1-1
dropbear 2015.68-1 2014.65-1
[...]

This semi-random Debian package list with several KDE elements suggests it tries to portray itself that way.

TP-Link TL-WR1043ND v1 on OpenWrt 15.05

admin Sunday October 4, 2015

I switched my TP-Link TL-WR1043ND v1 from TP-Link's firmware to OpenWrt 15.05 "Chaos Calmer" a couple of weeks ago. Besides errors when trying to connect from PPTP clients, there were no unfortunate surprises.

I was happy to see OpenWrt now includes a web interface (LuCI) enabled by default. It is not exactly the user-friendliest, but I found my way easily enough.

Although I did not do much with it, I found a few bugs, notably:

Broken realtime graphs
ddns-scripts sending unencrypted passwords without warning
SSH server (Dropbear) apparently only accessible from LAN, despite the configuration

The documentation is extensive, but its quality is poor. Installing while playing safe took me quite some time, though part of that was due to a bug in the previous firmware not accepting long filenames. Overall, I am not impressed, but I have no regrets. Coming from a bunch of volunteers, fair software.

I eventually realized that we have been experiencing "constant" intermittent wireless connectivity problems in 2 locations of the house. One of these is a decameter away from the router. The other is slightly more, but at the same floor and there is no exterior wall in between. At times, there was high packet loss and extreme latency. After discovering OpenWrt bug #12372, which possibly persists in OpenWrt 15.05, I suspected that our issue might have been a symptom of this bug, but the same problem persisted after going back to the manufacturer's firmware or to DD-WRT, so I ended up replacing with a TP-Link Archer C8.

Error 619 when trying to connect a NAT-ed client to a PPTP server - watch your router

admin Saturday October 3, 2015

Today, I realized my PPTP connections from Windows 7 and 8 machines were no longer able to connect to the PoPToP server I setup at the office. Strangely, nothing had changed on the server (still Debian 7 running PoPToP 1.3.4), on the server-side router, and my clients obviously had not both been changed enough to explain the breakage. The Windows error messages were not too precise. Server logs were also unhelpful, apparently pointing to a bug in PoPToP, which timed out after 30 seconds:

daemon.log

Copy to clipboard

Oct  3 16:47:25 deimos pptpd[24176]: CTRL: Client 173.178.241.108 control connection started
Oct  3 16:47:25 deimos pptpd[24176]: CTRL: Starting call (launching pppd, opening GRE)
Oct  3 16:47:25 deimos pptpd[24176]: GRE: Bad checksum from pppd.
Oct  3 16:47:55 deimos pptpd[24176]: GRE: read(fd=6,buffer=804f620,len=8196) from PTY failed: status = -1 error = Input/output error, usually caused by unexpected termination of pppd, check option syntax and pppd logs
Oct  3 16:47:55 deimos pptpd[24176]: CTRL: PTY read or GRE write failed (pty,gre)=(6,7)
Oct  3 16:47:55 deimos pptpd[24176]: CTRL: Reaping child PPP[24177]
Oct  3 16:47:55 deimos pptpd[24176]: CTRL: Client 173.178.241.108 control connection finished

I finally realized the change to blame was me switching my TP-Link router from TP-Link's firmware to OpenWrt (15.05). I do not understand much of how PPTP works, but it's quite complicated. Apparently, it uses non-standard GRE packets. Therefore, I am not sure if this is a PPTP bug or an OpenWrt bug, but for me the solution was most simple.

As explained in this description of error 619, there are several possible causes, but even if the client is clearly reaching the server, the issue can be client-side. If there is no firewall on the client OS, you should verify any client-side router, which I did by plugging one of the affected PC-s directly on the modem. The VPN could connect again, which confirmed the router was to blame.

OpenWrt does not have a "PPTP VPN passthrough" option to check, but a package to install (which is not installed by default in Chaos Calmer). Following the instructions on OpenWrt's PPTP NAT Traversal document (installing kmod-nf-nathelper-extra), I managed to get the clients to connect while NAT-ed behind OpenWrt.

Blog Actions

Publicity team delegated

Where are we?

What happened?

Conclusion

Update 2016-08-21

Update 2018-05-27

Update 2022-07-17

Update 2024-09

Causes and solutions

Windows Server

DNS clients and the protocol

Conclusion

Addendum