*** regebro has quit IRC | 00:13 | |
*** srichter has joined #zope | 00:15 | |
*** srichter has quit IRC | 02:17 | |
*** srichter has joined #zope | 03:18 | |
*** benji has quit IRC | 06:25 | |
*** __mac__ has joined #zope | 08:19 | |
*** __mac__ has quit IRC | 08:19 | |
*** __mac__ has joined #zope | 08:23 | |
*** nilo has joined #zope | 09:56 | |
*** sylvain has joined #zope | 10:13 | |
*** MrTango has joined #zope | 10:13 | |
*** regebro has joined #zope | 11:31 | |
*** MrTango has quit IRC | 11:48 | |
*** srichter has quit IRC | 13:11 | |
*** MrTango has joined #zope | 13:29 | |
*** srichter has joined #zope | 13:38 | |
*** nilo has quit IRC | 13:40 | |
*** nilo has joined #zope | 13:52 | |
*** __mac__ has quit IRC | 14:09 | |
*** __mac__ has joined #zope | 14:10 | |
*** benji has joined #zope | 14:18 | |
*** __mac__1 has joined #zope | 14:47 | |
*** __mac__ has quit IRC | 14:47 | |
*** MrTango has quit IRC | 15:03 | |
*** noodlepie has joined #zope | 15:18 | |
*** projekt01 has joined #zope | 15:28 | |
*** nilo has quit IRC | 16:16 | |
*** MrTango has joined #zope | 16:42 | |
*** MrTango has quit IRC | 16:57 | |
*** danielblackburn_ has quit IRC | 17:21 | |
*** danielblackburn has joined #zope | 17:21 | |
*** danielbl_ has joined #zope | 17:24 | |
*** danielblackburn has quit IRC | 17:28 | |
*** flipmcf has joined #zope | 17:44 | |
*** flipmcf has joined #zope | 17:44 | |
flipmcf | I'm having an issue making my Jenkins CI realize that a build is bad. | 17:44 |
---|---|---|
flipmcf | when I do ./bin/instance fg on my box, I get an exception: (conflicting zcml definitions) Easy enough to fix. | 17:45 |
flipmcf | but when I do ./bin/instance start - the daemon doesn't log the exception, and just keeps kicking off plone in a tight loop. I never get feedback that things are broken | 17:46 |
flipmcf | other than port 8080 doesn't respond correctly. | 17:46 |
mgedmin | what does your job do, other than start a background daemon? | 17:48 |
flipmcf | my jenkins final test is to do a './bin/instance status' and check the output. If status says it's running, all is good. Well, it's running (zdaemon) but the plone instance is crashing and restarting. This is a fail | 17:48 |
flipmcf | trying to get the actual command out of jenkins, but now that box is down. sigh. | 17:49 |
flipmcf | brb. | 17:49 |
*** projekt01 has quit IRC | 17:50 | |
flipmcf | sh "bin/instance status | grep 'program running' > /dev/null 2>&1" | 17:51 |
flipmcf | Jenkins runs that. grep returns non-zero if 'program running' isn't found, and that's what I'm relying on to trigger a failure. | 17:51 |
mgedmin | ah, ok, so you use jenkins to (re)start daemons? | 17:52 |
mgedmin | hm, this is tricky | 17:52 |
mgedmin | two approaches come to mind: (1) test actual http connectivity in a loop, or (2) grep zdaemon's log files for messages | 17:52 |
flipmcf | yeah. I threw it together with little knowlege - this is the first time I'm using Jenkins for CI | 17:52 |
flipmcf | #2 won't work, because the exception isn't logged. ?! | 17:53 |
mgedmin | CI -- to me -- would mean you start it up, run a bunch of tests, then shut it down | 17:53 |
flipmcf | mgedmin, yep. I get that. | 17:53 |
mgedmin | but that _would_ notice problems if the start up fails the way it does here | 17:53 |
flipmcf | we're really early in this build. We're far from production or code complete, so building Jenkins is happening now. | 17:54 |
flipmcf | we have no tests yet. | 17:54 |
flipmcf | but maybe, if I try to test the entire plone package (yuck) I'd see it. | 17:54 |
flipmcf | but we don't have tests in our product yet. | 17:54 |
mgedmin | from what I remember there may be two (or three) interesting log files | 17:54 |
flipmcf | there are two | 17:54 |
mgedmin | zdaemon has one, zdaemon can redirect the program's stdout to another, and the program itself may have its own logfile | 17:55 |
flipmcf | hum.... | 17:55 |
mgedmin | one of the three should show any errors that prevent zope from starting up | 17:55 |
flipmcf | it's like stderr went somewhere else! good thinking | 17:55 |
mgedmin | you may need to tweak zdaemon's configs to enable all the logs | 17:55 |
flipmcf | tweaking zdaemon's configs is what I feel is right too. But never done that :) | 17:56 |
mgedmin | one log file is <runner> ... transcript ... </runner> in zdaemon.conf (the stdout/stderr of the app goes there) | 17:56 |
mgedmin | another is <eventlog> in zdaemon.conf (zdaemon's own logs go there) | 17:57 |
mgedmin | and zope has its own logs | 17:57 |
mgedmin | TBH if you're thinking about going into production, do consider if you want to use zdaemon | 17:57 |
flipmcf | good thread to pull on. thanks a lot! | 17:57 |
mgedmin | it's not very maintained | 17:57 |
mgedmin | better options might be supervisor or systemd | 17:57 |
flipmcf | I'm hoping supervisord will be used instead. | 17:58 |
mgedmin | (I have my own doubts about the maintenance status of supervisor, but somebody told me they saw recent commits in git, IIRC) | 17:58 |
flipmcf | then there is this other one suggested who'se name escapes me. | 17:58 |
mgedmin | the other thing is, the supervisor process (zdaemon/supervisord/systemctl) can launch the zope process | 17:58 |
flipmcf | I've always worked alongside a sysadmin who can figure this out for me. | 17:58 |
mgedmin | but it cannot see when the zope process is ready to serve requests | 17:58 |
mgedmin | which makes detecting this kind of startup failure difficult | 17:59 |
mgedmin | maybe it would make sense for the supervisor to detect a crashed process in the first 60 seconds (configurable?) and then consider it failed, and stop restarting | 17:59 |
mgedmin | but maybe not | 17:59 |
mgedmin | (what if startup failed because of a transient error, e.g. the network being down temporarily?) | 18:00 |
mgedmin | I would rely on a separate monitoring system that watches the HTTP port | 18:00 |
flipmcf | when I showed this to my sysadmin (we're running a plone 4 instance in production) he expected it to fail in 2 minutes | 18:00 |
flipmcf | and it didn't | 18:00 |
mgedmin | (*cough* a cron script because I hate nagios *cough*) | 18:00 |
flipmcf | no. I want to check in a bad zcml config and make jenkins yell at me. | 18:01 |
flipmcf | that's it. | 18:01 |
mgedmin | it would be nice if it were possible to distinguish transient errors from configuration errors in a reliable, automated way | 18:01 |
mgedmin | currently there isn't | 18:01 |
flipmcf | it would also be nice if jenkins launched the instance so selenium can come by and poke around on the rendered templates | 18:01 |
mgedmin | (zope 2 itself is not very well maintained and I would hesitate to put it into production... except the plone people are keeping it alive somehow, so maybe it's not as bad as I make it sound) | 18:02 |
flipmcf | these sound related. | 18:02 |
mgedmin | so, my suggestion: 1. bin/instance start, 2. wait in a loop for port 8080 to start responding to requests, with a timeout | 18:02 |
flipmcf | 'not well maintained' may be a 'not broken don't fix it' in discuise which is where I see zope today. | 18:03 |
mgedmin | oh hey, I've a project here that does the wait loop (for a non-zope web app) with a single wget command | 18:03 |
mgedmin | wget -q --retry-connrefused -T 60 -O /dev/null http://127.0.0.1:8080/ && ... | 18:03 |
flipmcf | how about launch the instance without wrapping it in zdaemon? | 18:03 |
flipmcf | I'm not testing zdaemon, so why it it part of my CI? | 18:03 |
mgedmin | or - hey - can you maybe disable the autorestart-on-crash logic in zdaemon.cfg? | 18:04 |
flipmcf | that's where the easy answer is. | 18:04 |
flipmcf | ^^^^^ | 18:04 |
mgedmin | that will not help with the wait-until-it-comes-up bit | 18:04 |
flipmcf | yep, I can wait for the log file to say something about 'ready to handle requests' | 18:05 |
mgedmin | in fact, if you don't want to wait 60 seconds, it'd be better to have an actuall shell loop that checks instance status OR port 8080 being open | 18:05 |
*** sylvain has quit IRC | 18:05 | |
mgedmin | anyway good luck! | 18:06 |
flipmcf | sleep(x) is a great way to create race conditions. I don't want to be woken up at 1am because the CI threw a build error because of a race condition. | 18:06 |
flipmcf | even worse, I don't want to wake someone else up because I wrote a race condition. | 18:06 |
flipmcf | jesus said something about that. | 18:07 |
mgedmin | do-not-disturb mode on my phone between 9 pm and 10 am PROBLEM SOLVED :D | 18:07 |
flipmcf | lol | 18:07 |
flipmcf | the person I hire to respond to Jenkins noise shal not have DND on their phone. | 18:07 |
flipmcf | problem solved | 18:08 |
flipmcf | or created.... depends who you are. | 18:08 |
*** __mac__1 has quit IRC | 18:34 | |
flipmcf | I cannot find these configuration files. | 18:36 |
flipmcf | I'm going to spelunk in pdb. hopefully someone can chime in before I learn too much stuff I don't need to know. | 18:37 |
*** MrTango has joined #zope | 20:37 | |
*** __mac__ has joined #zope | 20:43 | |
*** noodlepie has quit IRC | 20:43 | |
*** __mac__ has quit IRC | 20:43 | |
*** noodlepie has joined #zope | 20:44 | |
*** __mac__ has joined #zope | 21:00 | |
*** lregebro has joined #zope | 21:58 | |
*** regebro has quit IRC | 22:00 | |
*** __mac__ has quit IRC | 22:09 | |
*** phizzy has joined #zope | 22:51 | |
*** noodlepie has quit IRC | 22:55 | |
*** MrTango has quit IRC | 23:06 |
Generated by irclog2html.py 2.15.1 by Marius Gedminas - find it at mg.pov.lt!