Best Python code snippet using autotest_python
mcmc_safe.py
Source:mcmc_safe.py
1"""2.. module:: mcmc3 :synopsis: Monte Carlo procedure4.. moduleauthor:: Benjamin Audren <benjamin.audren@epfl.ch>5This module defines one key function, :func:`chain`, that handles the Markov6chain. So far, the code uses only one chain, as no parallelization is done.7The following routine is also defined in this module, which is called at8every step:9* :func:`get_new_position` returns a new point in the parameter space,10 depending on the proposal density.11The :func:`chain` in turn calls several helper routines, defined in12:mod:`sampler`. These are called just once:13* :func:`compute_lkl() <sampler.compute_lkl>` is called at every step in the Markov chain, returning14 the likelihood at the current point in the parameter space.15* :func:`get_covariance_matrix() <sampler.get_covariance_matrix>`16* :func:`read_args_from_chain() <sampler.read_args_from_chain>`17* :func:`read_args_from_bestfit() <sampler.read_args_from_bestfit>`18* :func:`accept_step() <sampler.accept_step>`19Their usage is described in :mod:`sampler`. On the contrary, the following20routines are called at every step:21The arguments of these functions will often contain **data** and/or **cosmo**.22They are both initialized instances of respectively :class:`data` and the23cosmological class. They will thus not be described for every function.24"""25import os26import sys27import math28import random as rd29import numpy as np30import warnings31import scipy.linalg as la32from pprint import pprint33import io_mp34import sampler35def get_new_position(data, eigv, U, k, Cholesky, Rotation):36 """37 Obtain a new position in the parameter space from the eigen values of the38 inverse covariance matrix, or from the Cholesky decomposition (original39 idea by Anthony Lewis, in `Efficient sampling of fast and slow40 cosmological parameters <http://arxiv.org/abs/1304.4473>`_ )41 The three different jumping options, decided when starting a run with the42 flag **-j** are **global**, **sequential** and **fast** (by default) (see43 :mod:`parser_mp` for reference).44 .. warning::45 For running Planck data, the option **fast** is highly recommended, as46 it speeds up the convergence. Note that when using this option, the47 list of your likelihoods in your parameter file **must match** the48 ordering of your nuisance parameters (as always, they must come after49 the cosmological parameters, but they also must be ordered between50 likelihood, with, preferentially, the slowest likelihood to compute51 coming first).52 - **global**: varies all the parameters at the same time. Depending on the53 input covariance matrix, some degeneracy direction will be followed,54 otherwise every parameter will jump independently of each other.55 - **sequential**: varies every parameter sequentially. Works best when56 having no clue about the covariance matrix, or to understand which57 estimated sigma is wrong and slowing down the whole process.58 - **fast**: privileged method when running the Planck likelihood. Described59 in the aforementioned article, it separates slow (cosmological) and fast60 (nuisance) parameters.61 Parameters62 ----------63 eigv : numpy array64 Eigenvalues previously computed65 U : numpy_array66 Covariance matrix.67 k : int68 Number of points so far in the chain, is used to rotate through69 parameters70 Cholesky : numpy array71 Cholesky decomposition of the covariance matrix, and its inverse72 Rotation : numpy_array73 Not used yet74 """75 parameter_names = data.get_mcmc_parameters(['varying'])76 vector_new = np.zeros(len(parameter_names), 'float64')77 sigmas = np.zeros(len(parameter_names), 'float64')78 # Write the vector of last accepted points, or if it does not exist79 # (initialization routine), take the mean value80 vector = np.zeros(len(parameter_names), 'float64')81 try:82 for elem in parameter_names:83 vector[parameter_names.index(elem)] = \84 data.mcmc_parameters[elem]['last_accepted']85 except KeyError:86 for elem in parameter_names:87 vector[parameter_names.index(elem)] = \88 data.mcmc_parameters[elem]['initial'][0]89 # Initialize random seed90 rd.seed()91 # Choice here between sequential and global change of direction92 if data.jumping == 'global':93 for i in range(len(vector)):94 sigmas[i] = (math.sqrt(1/eigv[i]/len(vector))) * \95 rd.gauss(0, 1)*data.jumping_factor96 elif data.jumping == 'sequential':97 i = k % len(vector)98 sigmas[i] = (math.sqrt(1/eigv[i]))*rd.gauss(0, 1)*data.jumping_factor99 elif data.jumping == 'fast':100 #i = k % len(vector)101 j = k % len(data.over_sampling_indices)102 i = data.over_sampling_indices[j]103 ###############104 # method fast+global105 for index, elem in enumerate(data.block_parameters):106 # When the running index is below the maximum index of a block of107 # parameters, this block is varied, and **only this one** (note the108 # break at the end of the if clause, it is not a continue)109 if i < elem:110 if index == 0:111 Range = elem112 Previous = 0113 else:114 Range = elem-data.block_parameters[index-1]115 Previous = data.block_parameters[index-1]116 # All the varied parameters are given a random variation with a117 # sigma of 1. This will translate in a jump for all the118 # parameters (as long as the Cholesky matrix is non diagonal)119 for j in range(Range):120 sigmas[j+Previous] = (math.sqrt(1./Range)) * \121 rd.gauss(0, 1)*data.jumping_factor122 break123 else:124 continue125 else:126 print('\n\n Jumping method unknown (accepted : ')127 print('global, sequential, fast (default))')128 # Fill in the new vector129 if data.jumping in ['global', 'sequential']:130 vector_new = vector + np.dot(U, sigmas)131 else:132 vector_new = vector + np.dot(Cholesky, sigmas)133 # Check for boundaries problems134 flag = 0135 for i, elem in enumerate(parameter_names):136 value = data.mcmc_parameters[elem]['initial']137 if((str(value[1]) != str(-1) and value[1] is not None) and138 (vector_new[i] < value[1])):139 flag += 1 # if a boundary value is reached, increment140 elif((str(value[2]) != str(-1) and value[2] is not None) and141 vector_new[i] > value[2]):142 flag += 1 # same143 # At this point, if a boundary condition is not fullfilled, ie, if flag is144 # different from zero, return False145 if flag != 0:146 return False147 # Check for a slow step (only after the first time, so we put the test in a148 # try: statement: the first time, the exception KeyError will be raised)149 try:150 data.check_for_slow_step(vector_new)151 except KeyError:152 pass153 # If it is not the case, proceed with normal computation. The value of154 # new_vector is then put into the 'current' point in parameter space.155 for index, elem in enumerate(parameter_names):156 data.mcmc_parameters[elem]['current'] = vector_new[index]157 # Propagate the information towards the cosmo arguments158 data.update_cosmo1_arguments()159 data.update_cosmo2_arguments()160 return True161######################162# MCMC CHAIN163######################164def chain(cosmo1, cosmo2, data, command_line):165 """166 Run a Markov chain of fixed length with a Metropolis Hastings algorithm.167 Main function of this module, this is the actual Markov chain procedure.168 After having selected a starting point in parameter space defining the169 first **last accepted** one, it will, for a given amount of steps :170 + choose randomly a new point following the *proposal density*,171 + compute the cosmological *observables* through the cosmological module,172 + compute the value of the *likelihoods* of the desired experiments at this173 point,174 + *accept/reject* this point given its likelihood compared to the one of175 the last accepted one.176 Every time the code accepts :code:`data.write_step` number of points177 (quantity defined in the input parameter file), it will write the result to178 disk (flushing the buffer by forcing to exit the output file, and reopen it179 again.180 .. note::181 to use the code to set a fiducial file for certain fixed parameters,182 you can use two solutions. The first one is to put all input 1-sigma183 proposal density to zero (this method still works, but is not184 recommended anymore). The second one consist in using the flag "-f 0",185 to force a step of zero amplitude.186 """187 ## Initialisation188 loglike = 0189 # In case command_line.silent has been asked, outputs should only contain190 # data.out. Otherwise, it will also contain sys.stdout191 outputs = [data.out]192 if not command_line.silent:193 outputs.append(sys.stdout)194 use_mpi = False195 # check for MPI196 try:197 from mpi4py import MPI198 comm = MPI.COMM_WORLD199 rank = comm.Get_rank()200 # suppress duplicate output from slaves201 if rank:202 command_line.quiet = True203 use_mpi = True204 except ImportError:205 # set all chains to master if no MPI206 rank = 0207 # Initialise master and slave chains for superupdate.208 # Workaround in order to have one master chain and several slave chains even when209 # communication fails between MPI chains. It could malfunction on some hardware.210 # TODO: Would like to merge with MPI initialization above and make robust and logical211 # TODO: Or if keeping current scheme, store value and delete jumping_factor.txt212 # TODO: automatically if --parallel-chains is enabled213 if command_line.superupdate and data.jumping_factor:214 try:215 jump_file = open(command_line.folder + '/jumping_factor.txt','r')216 #if command_line.restart is None:217 if not use_mpi and command_line.parallel_chains:218 rank = 1219 warnings.warn('MPI not in use, flag --parallel-chains enabled, '220 'superupdate enabled, and a jumping_factor.txt file detected. '221 'If relaunching in the same folder or restarting a run this '222 'will cause all chains to be assigned as slaves. In this case '223 'instead note the value in jumping_factor.txt, delete the '224 'file, and pass the value with flag -f <value>. This warning '225 'may then appear again, but you can safely disregard it.')226 else:227 # For restart runs we want to save the input jumping factor228 # as starting jumping factor, but continue from the jumping229 # factor stored in the file.230 starting_jumping_factor = data.jumping_factor231 # This will load the value irrespective of whether it starts232 # with # (i.e. the jumping factor adaptation was started) or not.233 jump_value = jump_file.read().replace('# ','')234 data.jumping_factor = float(jump_value)235 jump_file.close()236 print 'rank = ',rank237 except:238 jump_file = open(command_line.folder + '/jumping_factor.txt','w')239 jump_file.write(str(data.jumping_factor))240 jump_file.close()241 rank = 0242 print 'rank = ',rank243 starting_jumping_factor = data.jumping_factor244 # Recover the covariance matrix according to the input, if the varying set245 # of parameters is non-zero246 if (data.get_mcmc_parameters(['varying']) != []):247 # Read input covariance matrix248 sigma_eig, U, C = sampler.get_covariance_matrix(cosmo1, cosmo2, data, command_line)249 # if we want to compute the starting point by minimising lnL (instead of taking it from input file or bestfit file)250 minimum = 0251 if command_line.minimize:252 minimum, min_chi2 = sampler.get_minimum(cosmo1, cosmo2, data, command_line, C)253 parameter_names = data.get_mcmc_parameters(['last_accepted'])254 for index,elem in parameter_names:255 data.mcmc_parameters[elem]['last_accepted'] = minimum[index]256 #FK: write out the results of the minimzer:257 labels = data.get_mcmc_parameters(['varying'])258 fname = os.path.join(command_line.folder, 'results.minimized')259 with open(fname, 'w') as f:260 f.write('# minimized \chi^2 = {:} \n'.format(min_chi2))261 f.write('# %s\n' % ', '.join(['%16s' % label for label in labels]))262 for idx in xrange(len(labels)):263 bf_value = minimum[idx]264 if bf_value > 0:265 f.write(' %.6e\t' % bf_value)266 else:267 f.write('%.6e\t' % bf_value)268 f.write('\n')269 print 'Results of minimizer saved to: \n', fname270 # if we want to compute Fisher matrix and then stop271 if command_line.fisher:272 sampler.get_fisher_matrix(cosmo1, cosmo2, data, command_line, C, minimum)273 return274 # warning if no jumps are requested275 if data.jumping_factor == 0:276 warnings.warn(277 "The jumping factor has been set to 0. The above covariance " +278 "matrix will not be used.")279 # In case of a fiducial run (all parameters fixed), simply run once and280 # print out the likelihood. This should not be used any more (one has to281 # modify the log.param, which is never a good idea. Instead, force the code282 # to use a jumping factor of 0 with the option "-f 0".283 else:284 warnings.warn(285 "You are running with no varying parameters... I will compute " +286 "only one point and exit")287 data.update_cosmo1_arguments() # this fills in the fixed parameters288 data.update_cosmo2_arguments() # this fills in the fixed parameters289 loglike = sampler.compute_lkl(cosmo1, cosmo2, data)290 io_mp.print_vector(outputs, 1, loglike, data)291 return 1, loglike292 # In the fast-slow method, one need the Cholesky decomposition of the293 # covariance matrix. Return the Cholesky decomposition as a lower294 # triangular matrix295 Cholesky = None296 Rotation = None297 if command_line.jumping == 'fast':298 Cholesky = la.cholesky(C).T299 Rotation = np.identity(len(sigma_eig))300 # define path and covmat301 input_covmat = command_line.cov302 base = os.path.basename(command_line.folder)303 # the previous line fails when "folder" is a string ending with a slash. This issue is cured by the next lines:304 if base == '':305 base = os.path.basename(command_line.folder[:-1])306 command_line.cov = os.path.join(307 command_line.folder, base+'.covmat')308 # Fast Parameter Multiplier (fpm) for adjusting update and superupdate numbers.309 # This is equal to N_slow + f_fast N_fast, where N_slow is the number of slow310 # parameters, f_fast is the over sampling number for each fast block and f_fast311 # is the number of parameters in each fast block.312 for i in range(len(data.block_parameters)):313 if i == 0:314 fpm = data.over_sampling[i]*data.block_parameters[i]315 else:316 fpm += data.over_sampling[i]*(data.block_parameters[i] - data.block_parameters[i-1])317 # If the update mode was selected, the previous (or original) matrix should be stored318 if command_line.update:319 if not rank and not command_line.silent:320 print 'Update routine is enabled with value %d (recommended: 50)' % command_line.update321 print 'This number is rescaled by cycle length %d (N_slow + f_fast * N_fast) to %d' % (fpm,fpm*command_line.update)322 # Rescale update number by cycle length N_slow + f_fast * N_fast to account for fast parameters323 command_line.update *= fpm324 previous = (sigma_eig, U, C, Cholesky)325 # Initialise adaptive326 if command_line.adaptive:327 if not command_line.silent:328 print 'Adaptive routine is enabled with value %d (recommended: 10*dimension)' % command_line.adaptive329 print 'and adaptive_ts = %d (recommended: 100*dimension)' % command_line.adaptive_ts330 print 'Please note: current implementation not suitable for multiple chains'331 if rank > 0:332 raise io_mp.ConfigurationError('Adaptive routine not compatible with MPI')333 if command_line.update:334 warnings.warn('Adaptive routine not compatible with update, overwriting input update value')335 if command_line.superupdate:336 warnings.warn('Adaptive routine not compatible with superupdate, deactivating superupdate')337 command_line.superupdate = 0338 # Define needed parameters339 parameter_names = data.get_mcmc_parameters(['varying'])340 mean = np.zeros(len(parameter_names))341 last_accepted = np.zeros(len(parameter_names),'float64')342 ar = np.zeros(100)343 if command_line.cov == None:344 # If no input covmat was given, the starting jumping factor345 # should be very small until a covmat is obtained and the346 # original start jumping factor should be saved347 start_jumping_factor = command_line.jumping_factor348 data.jumping_factor = command_line.jumping_factor/100.349 # Analyze module will be forced to compute one covmat,350 # after which update flag will be set to False.351 command_line.update = command_line.adaptive352 else:353 # If an input covmat was provided, take mean values from param file354 # Question: is it better to always do this, rather than setting mean355 # to last accepted after the initial update run?356 for elem in parameter_names:357 mean[parameter_names.index(elem)] = data.mcmc_parameters[elem]['initial'][0]358 # Initialize superupdate359 if command_line.superupdate:360 if not rank and not command_line.silent:361 print 'Superupdate routine is enabled with value %d (recommended: 20)' % command_line.superupdate362 if command_line.superupdate < 20:363 warnings.warn('Superupdate value lower than the recommended value. This '364 'may increase the risk of poorly converged acceptance rate')365 print 'This number is rescaled by cycle length %d (N_slow + f_fast * N_fast) to %d' % (fpm,fpm*command_line.superupdate)366 # Rescale superupdate number by cycle length N_slow + f_fast * N_fast to account for fast parameters367 command_line.superupdate *= fpm368 # Define needed parameters369 parameter_names = data.get_mcmc_parameters(['varying'])370 updated_steps = 0371 stop_c = False372 jumping_factor_rescale = 0373 if command_line.restart:374 try:375 jump_file = open(command_line.cov,'r')376 jumping_factor_rescale = 1377 except:378 jumping_factor_rescale = 0379 c_array = np.zeros(command_line.superupdate) # Allows computation of mean of jumping factor380 R_minus_one = np.array([100.,100.]) # 100 to make sure max(R-1) value is high if computation failed381 # Local acceptance rate of last SU*(N_slow + f_fast * N_fast) steps382 ar = np.zeros(command_line.superupdate)383 # Store acceptance rate of last 5*SU*(N_slow + f_fast * N_fast) steps384 backup_ar = np.zeros(5*command_line.superupdate)385 # Make sure update is enabled386 if command_line.update == 0:387 if not rank and not command_line.silent:388 print 'Update routine required by superupdate. Setting --update 50'389 print 'This number is then rescaled by cycle length: %d (N_slow + f_fast * N_fast)' % fpm390 command_line.update = 50 * fpm391 previous = (sigma_eig, U, C, Cholesky)392 # If restart wanted, pick initial value for arguments393 if command_line.restart is not None:394 sampler.read_args_from_chain(data, command_line.restart)395 # If restart from best fit file, read first point (overwrite settings of396 # read_args_from_chain)397 if command_line.bf is not None and not command_line.minimize:398 sampler.read_args_from_bestfit(data, command_line.bf)399 # Pick a position (from last accepted point if restart, from the mean value400 # else), with a 100 tries.401 for i in range(100):402 if get_new_position(data, sigma_eig, U, i,403 Cholesky, Rotation) is True:404 break405 if i == 99:406 raise io_mp.ConfigurationError(407 "You should probably check your prior boundaries... because " +408 "no valid starting position was found after 100 tries")409 # Compute the starting Likelihood410 loglike = sampler.compute_lkl(cosmo1, cosmo2, data)411 # Choose this step as the last accepted value412 # (accept_step), and modify accordingly the max_loglike413 sampler.accept_step(data)414 max_loglike = loglike415 # If the jumping factor is 0, the likelihood associated with this point is416 # displayed, and the code exits.417 if data.jumping_factor == 0:418 io_mp.print_vector(outputs, 1, loglike, data)419 return 1, loglike420 acc, rej = 0.0, 0.0 # acceptance and rejection number count421 N = 1 # number of time the system stayed in the current position422 # Print on screen the computed parameters423 if not command_line.silent and not command_line.quiet:424 io_mp.print_parameters(sys.stdout, data)425 # Suppress non-informative output after initializing426 command_line.quiet = True427 k = 1428 # Main loop, that goes on while the maximum number of failure is not429 # reached, and while the expected amount of steps (N) is not taken.430 while k <= command_line.N:431 # If the number of steps reaches the number set in the adaptive method plus one,432 # then the proposal distribution should be gradually adapted.433 # If the number of steps also exceeds the number set in adaptive_ts,434 # the jumping factor should be gradually adapted.435 if command_line.adaptive and k>command_line.adaptive+1:436 # Start of adaptive routine437 # By B. Schroer and T. Brinckmann438 # Modified version of the method outlined in the PhD thesis of Marta Spinelli439 # Store last accepted step440 for elem in parameter_names:441 last_accepted[parameter_names.index(elem)] = data.mcmc_parameters[elem]['last_accepted']442 # Recursion formula for mean and covmat (and jumping factor after ts steps)443 # mean(k) = mean(k-1) + (last_accepted - mean(k-1))/k444 mean += 1./k*(last_accepted-mean)445 # C(k) = C(k-1) + [(last_accepted - mean(k))^T * (last_accepted - mean(k)) - C(k-1)]/k446 C +=1./k*(np.dot(np.transpose(np.asmatrix(last_accepted-mean)),np.asmatrix(last_accepted-mean))-C)447 sigma_eig, U = np.linalg.eig(np.linalg.inv(C))448 if command_line.jumping == 'fast':449 Cholesky = la.cholesky(C).T450 if k>command_line.adaptive_ts:451 # c = j^2/d452 c = data.jumping_factor**2/len(parameter_names)453 # c(k) = c(k-1) + [acceptance_rate(last 100 steps) - 0.25]/k454 c +=(np.mean(ar)-0.25)/k455 data.jumping_factor = np.sqrt(len(parameter_names)*c)456 # Save the covariance matrix and the jumping factor in a file457 # For a possible MPI implementation458 #if not (k-command_line.adaptive) % 5:459 # io_mp.write_covariance_matrix(C,parameter_names,str(command_line.cov))460 # jump_file = open(command_line.folder + '/jumping_factor.txt','w')461 # jump_file.write(str(data.jumping_factor))462 # jump_file.close()463 # End of adaptive routine464 # If the number of steps reaches the number set in the update method,465 # then the proposal distribution should be adapted.466 if command_line.update:467 # Start of update routine468 # By M. Ballardini and T. Brinckmann469 # Also used by superupdate and adaptive470 # master chain behavior471 if not rank:472 # Add the folder to the list of files to analyze, and switch on the473 # options for computing only the covmat474 from parser_mp import parse475 info_command_line = parse(476 'info %s --minimal --noplot --keep-fraction 0.5 --keep-non-markovian --want-covmat' % command_line.folder)477 info_command_line.update = command_line.update478 if command_line.adaptive:479 # Keep all points for covmat guess in adaptive480 info_command_line = parse('info %s --minimal --noplot --keep-non-markovian --want-covmat' % command_line.folder)481 # Tell the analysis to update the covmat after t0 steps if it is adaptive482 info_command_line.adaptive = command_line.adaptive483 # Only compute covmat if no input covmat was provided484 if input_covmat != None:485 info_command_line.want_covmat = False486 # This is in order to allow for more frequent R-1 computation with superupdate487 compute_R_minus_one = False488 if command_line.superupdate:489 if not (k+10) % command_line.superupdate:490 compute_R_minus_one = True491 # the +10 below is here to ensure that the first master update will take place before the first slave updates,492 # but this is a detail, the code is robust against situations where updating is not possible, so +10 could be omitted493 if (not (k+10) % command_line.update or compute_R_minus_one) and k > 10:494 # Try to launch an analyze (computing a new covmat if successful)495 try:496 if not (k+10) % command_line.update:497 from analyze import analyze498 R_minus_one = analyze(info_command_line)499 elif command_line.superupdate:500 # Compute (only, i.e. no covmat) R-1 more often when using superupdate501 info_command_line = parse(502 'info %s --minimal --noplot --keep-fraction 0.5 --keep-non-markovian' % command_line.folder)503 info_command_line.update = command_line.update504 R_minus_one = analyze(info_command_line)505 except:506 if not command_line.silent:507 print 'Step ',k,' chain ', rank,': Failed to calculate covariance matrix'508 if command_line.superupdate:509 # Start of superupdate routine510 # By B. Schroer and T. Brinckmann511 c_array[(k-1)%(command_line.superupdate)] = data.jumping_factor512 # If acceptance rate deviates too much from the target acceptance513 # rate we want to resume adapting the jumping factor514 # T. Brinckmann 02/2019: use mean a.r. over the last 5*len(ar) steps515 # instead or the over last len(ar), which is more stable516 if abs(np.mean(backup_ar) - command_line.superupdate_ar) > 5.*command_line.superupdate_ar_tol:517 stop_c = False518 # Start adapting the jumping factor after command_line.superupdate steps if R-1 < 10519 # The lower R-1 criterium is an arbitrary choice to keep from updating when the R-1520 # calculation fails (i.e. returns only zeros).521 if (k > updated_steps + command_line.superupdate) and 0.01 < (max(R_minus_one) < 10.) and not stop_c:522 c = data.jumping_factor**2/len(parameter_names)523 # To avoid getting trapped in local minima, the jumping factor should524 # not go below 0.1 (arbitrary) times the starting jumping factor.525 if (c + (np.mean(ar) - command_line.superupdate_ar)/(k - updated_steps)) > (0.1*starting_jumping_factor)**2./len(parameter_names) or ((np.mean(ar) - command_line.superupdate_ar)/(k - updated_steps) > 0):526 c += (np.mean(ar) - command_line.superupdate_ar)/(k - updated_steps)527 data.jumping_factor = np.sqrt(len(parameter_names) * c)528 if not (k-1) % 5:529 # Check if the jumping factor adaptation should stop.530 # An acceptance rate of 25% balances the wish for more accepted531 # points, while ensuring the parameter space is properly sampled.532 # The convergence criterium is by default (26+/-1)%, so the adaptation533 # will stop when the code reaches an acceptance rate of at least 25%.534 # T. Brinckmann 02/2019: use mean a.r. over the last 5*len(ar) steps535 # instead or the over last len(ar), which is more stable536 if (max(R_minus_one) < 0.4) and (abs(np.mean(backup_ar) - command_line.superupdate_ar) < command_line.superupdate_ar_tol) and (abs(np.mean(c_array)/c_array[(k-1) % (command_line.superupdate)] - 1) < 0.01):537 stop_c = True538 data.out.write('# After %d accepted steps: stop adapting the jumping factor at a value of %f with a local acceptance rate %f \n' % (int(acc),data.jumping_factor,np.mean(backup_ar)))539 if not command_line.silent:540 print 'After %d accepted steps: stop adapting the jumping factor at a value of %f with a local acceptance rate of %f \n' % (int(acc), data.jumping_factor,np.mean(backup_ar))541 jump_file = open(command_line.folder + '/jumping_factor.txt','w')542 jump_file.write('# '+str(data.jumping_factor))543 jump_file.close()544 else:545 jump_file = open(command_line.folder + '/jumping_factor.txt','w')546 jump_file.write(str(data.jumping_factor))547 jump_file.close()548 # Write the evolution of the jumping factor to a file549 if not k % (command_line.superupdate):550 jump_file = open(command_line.folder + '/jumping_factors.txt','a')551 for i in xrange(command_line.superupdate):552 jump_file.write(str(c_array[i])+'\n')553 jump_file.close()554 # End of main part of superupdate routine555 if not (k-1) % (command_line.update/3):556 try:557 # Read the covmat558 sigma_eig, U, C = sampler.get_covariance_matrix(559 cosmo1, cosmo2, data, command_line)560 if command_line.jumping == 'fast':561 Cholesky = la.cholesky(C).T562 # Test here whether the covariance matrix has really changed563 # We should in principle test all terms, but testing the first one should suffice564 if not C[0,0] == previous[2][0,0]:565 if k == 1:566 if not command_line.silent:567 if not input_covmat == None:568 warnings.warn(569 'Appending to an existing folder: using %s instead of %s. '570 'If new input covmat is desired, please delete previous covmat.'571 % (command_line.cov, input_covmat))572 else:573 warnings.warn(574 'Appending to an existing folder: using %s. '575 'If no starting covmat is desired, please delete previous covmat.'576 % command_line.cov)577 else:578 # Start of second part of superupdate routine579 if command_line.superupdate:580 # Adaptation of jumping factor should start again after the covmat is updated581 # Save the step number after it updated for superupdate and start adaption of c again582 updated_steps = k583 stop_c = False584 cov_det = np.linalg.det(C)585 prev_cov_det = np.linalg.det(previous[2])586 # Rescale jumping factor in order to keep the magnitude of the jumps the same.587 # Skip this update the first time the covmat is updated in order to prevent588 # problems due to a poor initial covmat. Rescale the jumping factor after the589 # first calculated covmat to the expected optimal one of 2.4.590 if jumping_factor_rescale:591 new_jumping_factor = data.jumping_factor * (prev_cov_det/cov_det)**(1./(2 * len(parameter_names)))592 data.out.write('# After %d accepted steps: rescaled jumping factor from %f to %f, due to updated covariance matrix \n' % (int(acc), data.jumping_factor, new_jumping_factor))593 if not command_line.silent:594 print 'After %d accepted steps: rescaled jumping factor from %f to %f, due to updated covariance matrix \n' % (int(acc), data.jumping_factor, new_jumping_factor)595 data.jumping_factor = new_jumping_factor596 else:597 data.jumping_factor = starting_jumping_factor598 jumping_factor_rescale += 1599 # End of second part of superupdate routine600 # Write to chains file when the covmat was updated601 data.out.write('# After %d accepted steps: update proposal with max(R-1) = %f and jumping factor = %f \n' % (int(acc), max(R_minus_one), data.jumping_factor))602 if not command_line.silent:603 print 'After %d accepted steps: update proposal with max(R-1) = %f and jumping factor = %f \n' % (int(acc), max(R_minus_one), data.jumping_factor)604 try:605 if stop-after-update:606 k = command_line.N607 print 'Covariance matrix updated - stopping run'608 except:609 pass610 previous = (sigma_eig, U, C, Cholesky)611 except:612 pass613 command_line.quiet = True614 # Start of second part of adaptive routine615 # Stop updating the covmat after t0 steps in adaptive616 if command_line.adaptive and k > 1:617 command_line.update = 0618 data.jumping_factor = start_jumping_factor619 # Test if there are still enough steps left before the adaption of the jumping factor starts620 if k > 0.5*command_line.adaptive_ts:621 command_line.adaptive_ts += k622 # Set the mean for the recursion formula to the last accepted point623 for elem in parameter_names:624 mean[parameter_names.index(elem)] = data.mcmc_parameters[elem]['last_accepted']625 # End of second part of adaptive routine626 # slave chain behavior627 else:628 # Start of slave superupdate routine629 if command_line.superupdate:630 # If acceptance rate deviates too much from the target acceptance631 # rate we want to resume adapting the jumping factor. This line632 # will force the slave chains to check if the jumping factor633 # has been updated634 if abs(np.mean(backup_ar) - command_line.superupdate_ar) > 5.*command_line.superupdate_ar_tol:635 stop_c = False636 # Update the jumping factor every 5 steps in superupdate637 if not k % 5 and k > command_line.superupdate and command_line.superupdate and (not stop_c or (stop_c and k % command_line.update)):638 try:639 jump_file = open(command_line.folder + '/jumping_factor.txt','r')640 # If there is a # in the file, the master has stopped adapting c641 for line in jump_file:642 if line.find('#') == -1:643 jump_file.seek(0)644 jump_value = jump_file.read()645 data.jumping_factor = float(jump_value)646 else:647 jump_file.seek(0)648 jump_value = jump_file.read().replace('# ','')649 #if not stop_c or (stop_c and not float(jump_value) == data.jumping_factor):650 if not float(jump_value) == data.jumping_factor:651 data.jumping_factor = float(jump_value)652 stop_c = True653 data.out.write('# After %d accepted steps: stop adapting the jumping factor at a value of %f with a local acceptance rate %f \n' % (int(acc),data.jumping_factor,np.mean(backup_ar)))654 if not command_line.silent:655 print 'After %d accepted steps: stop adapting the jumping factor at a value of %f with a local acceptance rate of %f \n' % (int(acc), data.jumping_factor,np.mean(backup_ar))656 jump_file.close()657 except:658 if not command_line.silent:659 print 'Reading jumping_factor file failed'660 pass661 # End of slave superupdate routine662 # Start of slave update routine663 if not (k-1) % (command_line.update/10):664 try:665 sigma_eig, U, C = sampler.get_covariance_matrix(666 cosmo1, cosmo2, data, command_line)667 if command_line.jumping == 'fast':668 Cholesky = la.cholesky(C).T669 # Test here whether the covariance matrix has really changed670 # We should in principle test all terms, but testing the first one should suffice671 if not C[0,0] == previous[2][0,0] and not k == 1:672 if command_line.superupdate:673 # If the covmat was updated, the master has resumed adapting c674 stop_c = False675 data.out.write('# After %d accepted steps: update proposal \n' % int(acc))676 if not command_line.silent:677 print 'After %d accepted steps: update proposal \n' % int(acc)678 try:679 if stop_after_update:680 k = command_line.N681 print 'Covariance matrix updated - stopping run'682 except:683 pass684 previous = (sigma_eig, U, C, Cholesky)685 except:686 pass687 # End of slave update routine688 # End of update routine689 # Pick a new position ('current' flag in mcmc_parameters), and compute690 # its likelihood. If get_new_position returns True, it means it did not691 # encounter any boundary problem. Otherwise, just increase the692 # multiplicity of the point and start the loop again693 if get_new_position(694 data, sigma_eig, U, k, Cholesky, Rotation) is True:695 newloglike = sampler.compute_lkl(cosmo1, cosmo2, data)696 else: # reject step697 rej += 1698 if command_line.superupdate:699 ar[k%len(ar)] = 0 # Local acceptance rate of last SU*(N_slow + f_fast * N_fast) steps700 elif command_line.adaptive:701 ar[k%len(ar)] = 0 # Local acceptance rate of last 100 steps702 N += 1703 k += 1704 continue705 # Harmless trick to avoid exponentiating large numbers. This decides706 # whether or not the system should move.707 if (newloglike != data.boundary_loglike):708 if (newloglike >= loglike):709 alpha = 1.710 else:711 alpha = np.exp(newloglike-loglike)712 else:713 alpha = -1714 if ((alpha == 1.) or (rd.uniform(0, 1) < alpha)): # accept step715 # Print out the last accepted step (WARNING: this is NOT the one we716 # just computed ('current' flag), but really the previous one.)717 # with its proper multiplicity (number of times the system stayed718 # there).719 io_mp.print_vector(outputs, N, loglike, data)720 # Report the 'current' point to the 'last_accepted'721 sampler.accept_step(data)722 loglike = newloglike723 if loglike > max_loglike:724 max_loglike = loglike725 acc += 1.0726 N = 1 # Reset the multiplicity727 if command_line.superupdate:728 ar[k%len(ar)] = 1 # Local acceptance rate of last SU*(N_slow + f_fast * N_fast) steps729 elif command_line.adaptive:730 ar[k%len(ar)] = 1 # Local acceptance rate of last 100 steps731 else: # reject step732 rej += 1.0733 N += 1 # Increase multiplicity of last accepted point734 if command_line.superupdate:735 ar[k%len(ar)] = 0 # Local acceptance rate of last SU*(N_slow + f_fast * N_fast) steps736 elif command_line.adaptive:737 ar[k%len(ar)] = 0 # Local acceptance rate of last 100 steps738 # Store a.r. for last 5 x SU*(N_slow + f_fast * N_fast) steps739 if command_line.superupdate:740 backup_ar[k%len(backup_ar)] = ar[k%len(ar)]741 # Regularly (option to set in parameter file), close and reopen the742 # buffer to force to write on file.743 if acc % data.write_step == 0:744 io_mp.refresh_file(data)745 # Update the outputs list746 outputs[0] = data.out747 k += 1 # One iteration done748 # END OF WHILE LOOP749 # If at this moment, the multiplicity is higher than 1, it means the750 # current point is not yet accepted, but it also mean that we did not print751 # out the last_accepted one yet. So we do.752 if N > 1:753 io_mp.print_vector(outputs, N-1, loglike, data)754 # Print out some information on the finished chain755 rate = acc / (acc + rej)756 sys.stdout.write('\n# {0} steps done, acceptance rate: {1}\n'.757 format(command_line.N, rate))758 # In case the acceptance rate is too low, or too high, print a warning759 if rate < 0.05:760 warnings.warn("The acceptance rate is below 0.05. You might want to "761 "set the jumping factor to a lower value than the "762 "default (2.4), with the option `-f 1.5` for instance.")763 elif rate > 0.6:764 warnings.warn("The acceptance rate is above 0.6, which means you might"765 " have difficulties exploring the entire parameter space"766 ". Try analysing these chains, and use the output "767 "covariance matrix to decrease the acceptance rate to a "768 "value between 0.2 and 0.4 (roughly).")769 # For a restart, erase the starting point to keep only the new, longer770 # chain.771 if command_line.restart is not None:772 os.remove(command_line.restart)773 sys.stdout.write(' deleting starting point of the chain {0}\n'.774 format(command_line.restart))...
pre-processing.py
Source:pre-processing.py
1__author__ = 'mahandong'2import os, errno, re3from lib.file import *4from lib.string import *5import shlex, subprocess6###########################################################################7dataDir = check_dir('/Volumes/1/data/') # './data/'8createdDir = ['./manual', './lexicon', './train']9defaultZipFileExtension = '.tgz'10defaultWavFileExtension = '.wav'11promptsSubDir = 'etc/' # dir that within each data zip file and contain PROMPTS file12wavSubDir = 'wav/' # dir that within each data zip file and contain wav files13mfcSubDir = 'mfc/'14integratedPROMPTSFilePath = "./manual/prompts"15wlistFullPath = './manual/wlist'16dlogFullPath = './manual/dlog'17cmd_list = []18###########################################################################19def init(createdDir):20 for currDir in createdDir:21 if not os.path.exists(currDir):22 mkdir(currDir)23init(createdDir)24#create create a prompts file - which is the list of words we will record in the next Step;25targetDataFolder = [] # all folders that has a defaultZipFileExtension file26def unzipFolders(dataDir):27 dataFiles = os.listdir(dataDir)28 for i in dataFiles:29 if os.path.splitext(i)[-1] == defaultZipFileExtension: # all zip files30 if os.path.splitext(i)[0] not in targetDataFolder:31 targetDataFolder.append(os.path.splitext(i)[0]) # 23yipikaye-20100807-ujm32 if 1:#not os.path.isdir(dataDir + os.path.splitext(i)[0]):33 command_line = 'tar -zxf ' + dataDir + i + ' -C ' + dataDir34 cmd_list.append(command_line)35 cmd(command_line)36 print str(len(targetDataFolder)) + ' data folders in data source'37unzipFolders(dataDir)38#check: each dir must have PROMPTS file /WAV folder to be included39#long time40passDir = []41def checkDataQuality(targetDataFolder, modifyPrompts=0):42 length1 = len(targetDataFolder)43 totalModifiedNumber = 044 for i in targetDataFolder:45 promptsName = dataDir + check_dir(i) + promptsSubDir + 'PROMPTS'46 wavDir = dataDir + check_dir(i) + wavSubDir47 if (not file_exist(promptsName)) or (not os.path.isdir(wavDir)):48 targetDataFolder.remove(i)49 passDir.append(i)50 else:51 if modifyPrompts == 1:52 modify = 053 with open(promptsName) as p:54 content = p.read()55 last = content.rsplit('\n')[-1]56 ### all the modifications needed for the prompts file57 ###list of changes58 replacedText = content59 #replacedText = content.replace(' & ', ' AND ')60 #replacedText = replacedText.replace(' 2000 ', ' TWO THOUSAND ')61 #replacedText = replacedText.replace("\'EM", "THEM")62 if replacedText != content:63 modify = 164 with open(promptsName, 'w') as newFile:65 newFile.write(replacedText)66 newFile.close()67 ###68 if not last.strip() == '':69 modify = 170 addBlankLineAtFileEnd(promptsName)71 if modify == 1:72 totalModifiedNumber += 173 print str(length1 - len(targetDataFolder)) + ' dir do not contain PROMPTS file or WAV folder, ' + str(len(targetDataFolder)) + ' left usable'74 targetDataFolder.sort()75checkDataQuality(targetDataFolder, 0)76existing = os.listdir(dataDir)77targetDataFolder_existing = list(set(existing) & set(targetDataFolder))78dataPROMPTSPathList = [dataDir + check_dir(i) + promptsSubDir + 'PROMPTS' for i in targetDataFolder_existing]79passDir.extend(list(set(targetDataFolder)-set(targetDataFolder_existing)))80#generate integratedPROMPTSFile81# LONG82def getPrompts(integratedPROMPTSFilePath, dataPROMPTSPathList, dataDir):83 if file_exist(integratedPROMPTSFilePath):84 rm(integratedPROMPTSFilePath)85 vi(integratedPROMPTSFilePath)86 [addBlankLineAtFileEnd(x) for x in dataPROMPTSPathList]87 cat(integratedPROMPTSFilePath, dataPROMPTSPathList)88 removeEmptyLinesInFile(integratedPROMPTSFilePath)89 #modify current prompts file to contain the full path in the first col90 ###fast91 fhOut = open(integratedPROMPTSFilePath + '_tmp', 'wb')92 with open(integratedPROMPTSFilePath, 'r') as prompts:93 lines = prompts.readlines()94 prompts.close()95 for i in lines:96 fhOut.write(check_dir(dataDir) + i)97 fhOut.close()98 command_line = 'mv ' + integratedPROMPTSFilePath + '_tmp ' + integratedPROMPTSFilePath99 os.system(command_line)100 removeEmptyLinesInFile(integratedPROMPTSFilePath)101 ###102 print 'Integrated prompts file generated'103getPrompts(integratedPROMPTSFilePath, dataPROMPTSPathList, dataDir)104#########################################################################105#generate wlist file106"""107The HTK Perl script prompts2wlist can take the prompts file you just created,108and remove the file name in the first column and print each word on one line into a word list file (wlist).109"""110def getWlist(wlistFullPath, integratedPROMPTSFilePath):111 try:112 command_line = 'perl ./lib/HTK_scripts/prompts2wlist ' + integratedPROMPTSFilePath + ' ' + wlistFullPath113 cmd_list.append(command_line)114 cmd(command_line)115 print 'wlist generated'116 except Exception as e:117 print 'wlist generation error' + str(e)118 ifContinue()119 # wlist contains non-alphabetical characters: ERROR [+5013] ReadString: String too long120 #normalize wlist file by Handong Ma121 command_lines = ['cp ' + wlistFullPath + ' ' + wlistFullPath + '_ori',122 'sed \'/[\\"\,\:\;\&\.\\\/\!\s*]/d\' '+ wlistFullPath + ' > ./tmp1',123 'tr \'[:lower:]\' \'[:upper:]\' < ./tmp1 > ./tmp2', # TO UPPER CASE124 'sed \'/^-/d\' ./tmp2 > tmp1',125 'sed "/^\'/d" tmp1 > tmp2',126 'sed -e \'s/[0-9]*//g\' tmp2 > tmp1', # DELETE NUMBERS127 'sed \'/^$/d\' tmp1 > tmp2', # DELETE EMPTY LINE128 'awk \'!x[$0]++\' tmp2 > ' + wlistFullPath,129 'rm tmp1 tmp2']130 for command_line in command_lines:131 os.system(command_line)132 #manually add the following entries to your wlist file (in sorted order):133 try:134 fhOut = open(wlistFullPath, 'a')135 fhOut.write('SENT-END\nSENT-START')136 fhOut.close()137 command_line = 'sort ' + wlistFullPath + ' -o ' + wlistFullPath138 cmd_list.append(command_line)139 os.system(command_line)140 print "wlist file edited and sorted"141 except:142 print "edit wlist file error"143 ifContinue()144getWlist(wlistFullPath, integratedPROMPTSFilePath)145# add pronunciation dictionary146'''147The next step is to add pronunciation information (i.e. the phonemes that make up the word) to each of the words in the wlist file,148thus creating a Pronunciation Dictionnary. HTK uses the HDMan command to go through the wlist file,149and look up the pronunciation for each word in a separate lexicon file,150and output the result in a Pronunciation Dictionnary.151'''152def runHDManGetMonophone(wlistFullPath,dictPath='./manual/dict',mono0Path='./manual/monophones0',mono1Path='./manual/monophones1',dlogPath='./manual/dlog'):153 fhOut = open('./manual/global.ded', 'w')154 fhOut.write(155 'AS sp\nRS cmu\nMP sil sil sp') # This is mainly used to convert all the words in the dict file to uppercase156 fhOut.close()157 command_line = 'cp ./lib/support_data/VoxForge/VoxForgeDict ./lexicon/'158 cmd_list.append(command_line)159 cmd(command_line)160 #this step requires that HTK is successfully installed on the machine and HDMan is executable161 try:162 #run HDMan163 command_line = "HDMan -A -D -T 1 -m -w "+wlistFullPath+" -n "+mono1Path+" -i -l "+dlogPath+" "+dictPath+" ./lexicon/VoxForgeDict"164 cmd_list.append(command_line)165 cmd(command_line)166 #create monophones0167 command_line = 'sed /^sp$/d '+mono1Path+' > '+mono0Path # remove the short-pause "sp" entry168 cmd_list.append(command_line)169 os.system(command_line)170 ##method 2, with stdout171 #command_line = 'sed /^ax$/d ./manual/monophones1'172 #cmd_stdout2file(command_line, './manual/monophones0')173 print 'HDMan program run and monophones1/dict/monophones0 files created'174 except Exception as e:175 print 'HDMan running error' + str(e)176 ifContinue()177runHDManGetMonophone(wlistFullPath)178#create a Master Label File (MLF)179def getMLF():180 try:181 command_line = 'perl ./lib/HTK_scripts/prompts2mlf ./manual/words.mlf ./manual/prompts'182 cmd_list.append(command_line)183 cmd(command_line)184 except Exception as e:185 print 'mlf file creating error: ', str(e)186 ifContinue()187 ####modify mlf file188 os.system('sed -e \'s/^2000$/TWO THOUSAND/g\' ./manual/words.mlf > ./manual/words.mlf_new')189 os.system('sed -e \'s/^&$/AND/g\' ./manual/words.mlf_new > ./manual/words.mlf')190 os.system('sed -e \"s/^\'EM$/THEM/g\" ./manual/words.mlf > ./manual/words.mlf_new')191 os.system('mv ./manual/words.mlf_new ./manual/words.mlf')192getMLF()193#Phone Level Transcriptions194"""195Next you need to execute the HLEd command to expand the Word Level Transcriptions to Phone Level Transcriptions - i.e.196replace each word with its phonemes, and put the result in a new Phone Level Master Label File197This is done by reviewing each word in the MLF file,198and looking up the phones that make up that word in the dict file you created earlier,199and outputing the result in a file called phones0.mlf (which will not have short pauses ("sp"s) after each word phone group).200"""201#######missing words202'''203unmatched = []204with open(dlogFullPath, 'r') as dlog:205 lines = dlog.readlines()206 for line in lines:207 if line.rstrip().isupper():208 unmatched.append(line.rstrip())209sed(unmatched, './manual/words.mlf', './manual/words.mlf')210'''211#######212########## ERROR:ERROR [+6550] LoadHTKList: Label Name Expected {NO NUMBER SHOULD BE INCLUDED IN prompts FILE, CHANGE TO ENGLISH REPRESENTATION}213fhOut = open('./manual/mkphones0.led', 'w')214fhOut.write("EX\nIS sil sil\nDE sp\n") # remember to include a blank line at the end of this script)215fhOut.close()216command_line = 'HLEd -A -D -T 1 -l \'*\' -d ./manual/dict -i ./manual/phones0.mlf ./manual/mkphones0.led ./manual/words.mlf '217cmd_list.append(command_line)218cmd(command_line)219fhOut = open('./manual/mkphones1.led', 'w')220fhOut.write("EX\nIS sil sil\n") # remember to include a blank line at the end of this script)221fhOut.close()222command_line = 'HLEd -A -D -T 1 -l \'*\' -d ./manual/dict -i ./manual/phones1.mlf ./manual/mkphones1.led ./manual/words.mlf '223cmd_list.append(command_line)224cmd(command_line)225#############226#step 5227fhOut = open('./manual/wav_config', 'w')228fhOut.write("SOURCEFORMAT = WAV\n\229TARGETKIND = MFCC_0_D\n\230TARGETRATE = 100000.0\n\231SAVECOMPRESSED = T\n\232SAVEWITHCRC = T\n\233WINDOWSIZE = 250000.0\n\234USEHAMMING = T\n\235PREEMCOEF = 0.97\n\236NUMCHANS = 26\n\237CEPLIFTER = 22\n\238NUMCEPS = 12\n") # remember to include a blank line at the end of this script)239fhOut.close()240codetrainContent = []241trainScpContent = []242noWavDir = []243for dirs in targetDataFolder_existing:244 fullDir = dataDir + check_dir(dirs) + wavSubDir # './data/Aaron-20080318-lbk/wav/'245 newMfcDir = dataDir + check_dir(dirs) + mfcSubDir246 if os.path.isdir(fullDir):247 wavFiles = os.listdir(fullDir)248 for currWav in wavFiles:249 if os.path.splitext(currWav)[-1] == defaultWavFileExtension:250 currWavFullPath = fullDir + currWav251 currMfcFullPath = newMfcDir + os.path.splitext(currWav)[0] + '.mfc'252 trainScpContent.append(currMfcFullPath)253 if not file_exist(fullDir + os.path.splitext(currWav)[0] + '.mfc') or not os.path.isdir(check_dir(newMfcDir)) or len(os.listdir(check_dir(newMfcDir)))==0:254 codetrainContent.append(currWavFullPath + ' ' + currMfcFullPath)255 mkdir(check_dir(newMfcDir))256 else:257 noWavDir.append(dir)258 print str(dir) + 'still contains no wav file'259 if dir in targetDataFolder_existing:260 targetDataFolder_existing.remove(dir)261fhOut = open('./manual/codetrain.scp', 'w')262for i in codetrainContent:263 fhOut.write(i+'\n')264fhOut.close()265#LONG if codetrainContent is big266if len(codetrainContent) >0:267 command_line = 'HCopy -A -D -T 1 -C ./manual/wav_config -S ./manual/codetrain.scp '268 cmd_list.append(command_line)269 cmd(command_line)270#end of data pre-processing271######################################################################272#start of Monophones273cmd('cp ./lib/support_data/proto ./manual/')274cmd('cp ./lib/support_data/config ./manual/')275#Note: the target kind in you proto file (the "MFCC_0_D_N_Z" on the first line),276# needs to match the TARGETKIND in your config file.277fhOut = open('./manual/train.scp', 'w')278excludePattenInWavFiles = ['-old1', '-original']279for i in trainScpContent:280 passWav = 0281 for ex in excludePattenInWavFiles:282 if ex in i:283 passWav = 1284 if passWav == 0:285 fhOut.write(i+"\n")286fhOut.close()287mk_new_dir('./manual/hmm0')288command_line = 'HCompV -A -D -T 1 -C ./manual/config -f 0.01 -m -S ./manual/train.scp -M ./manual/hmm0 ./manual/proto'289cmd_list.append(command_line)290cmd(command_line)291#Flat Start Monophones292cmd('cp ./manual/monophones0 ./manual/hmm0/')293cmd('mv ./manual/hmm0/monophones0 ./manual/hmm0/hmmdefs')294#modify hmmdefs295'''296put the phone in double quotes;297add '~h ' before the phone (note the space after the '~h'); and298copy from line 5 onwards (i.e. starting from "<BEGINHMM>" to "<ENDHMM>") of the hmm0/proto file and paste it after each phone.299Leave one blank line at the end of your file.300'''301os.system('sed -e \'1,4d\' ./manual/hmm0/proto > ./manual/hmm0/proto_prune')302with open('./manual/hmm0/hmmdefs','r') as hmmdefs:303 defsLines = hmmdefs.readlines()304hmmdefs.close()305with open('./manual/hmm0/proto_prune','r') as proto:306 protoPart = proto.readlines()307proto.close()308fhOut = open('./manual/hmm0/hmmdefs_new', 'w')309for defsLine in defsLines:310 newLine = '~h '+ '\"' + defsLine.rstrip() + '\"\n'311 fhOut.write(newLine)312 for j in protoPart:313 fhOut.write(j)314fhOut.write('\n') # Leave one blank line at the end of your file.315fhOut.close()316cmd('mv ./manual/hmm0/hmmdefs_new ./manual/hmm0/hmmdefs')317#Create macros File318os.system('head -3 ./manual/hmm0/proto > ./manual/hmm0/proto_head')319os.system('cat ./manual/hmm0/proto_head ./manual/hmm0/vFloors > ./manual/hmm0/macros')320#Re-estimate Monophones321for i in range(15):322 j = i+1323 mkdir('./manual/hmm'+str(j))324command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/phones0.mlf -t 250.0 150.0 1000.0 -S ./manual/train.scp -H ./manual/hmm0/macros -H ./manual/hmm0/hmmdefs -M ./manual/hmm1 ./manual/monophones0'325cmd_list.append(command_line)326cmd(command_line)327command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/phones0.mlf -t 250.0 150.0 1000.0 -S ./manual/train.scp -H ./manual/hmm1/macros -H ./manual/hmm1/hmmdefs -M ./manual/hmm2 ./manual/monophones0'328cmd_list.append(command_line)329cmd(command_line)330command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/phones0.mlf -t 250.0 150.0 1000.0 -S ./manual/train.scp -H ./manual/hmm2/macros -H ./manual/hmm2/hmmdefs -M ./manual/hmm3 ./manual/monophones0'331cmd_list.append(command_line)332cmd(command_line)333#step 7334#####################################################################################################################335existFiles = os.listdir('./manual/hmm4/')336if len(existFiles) == 0:337 os.system('cp ./manual/hmm3/* ./manual/hmm4/')338 ############################339 print 'need manual correction for ./manual/hmm4/hmmdefs here'340 ############################341else:342 print 'file exists in ./manual/hmm4/hmmdefs, whether continue?'343 ifContinue()344fhOut = open('./manual/sil.hed', 'w')345fhOut.write('AT 2 4 0.2 {sil.transP}\n\346AT 4 2 0.2 {sil.transP}\n\347AT 1 3 0.3 {sp.transP}\n\348TI silst {sil.state[3],sp.state[2]}\n')349fhOut.close()350command_line = 'HHEd -A -D -T 1 -H ./manual/hmm4/macros -H ./manual/hmm4/hmmdefs -M ./manual/hmm5/ ./manual/sil.hed ./manual/monophones1'351cmd_list.append(command_line)352cmd(command_line)353command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/phones1.mlf -t 250.0 150.0 3000.0 -S ./manual/train.scp -H ./manual/hmm5/macros -H ./manual/hmm5/hmmdefs -M ./manual/hmm6 ./manual/monophones1'354cmd_list.append(command_line)355cmd(command_line)356command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/phones1.mlf -t 250.0 150.0 3000.0 -S ./manual/train.scp -H ./manual/hmm6/macros -H ./manual/hmm6/hmmdefs -M ./manual/hmm7 ./manual/monophones1'357cmd_list.append(command_line)358cmd(command_line)359###step 8360command_line = 'HVite -A -D -T 1 -l \'*\' -o SWT -b SENT-END -C ./manual/config -H ./manual/hmm7/macros -H ./manual/hmm7/hmmdefs -i ./manual/aligned.mlf -m -t 250.0 150.0 1000.0 -y lab -a -I ./manual/words.mlf -S ./manual/train.scp ./manual/dict ./manual/monophones1'361cmd_list.append(command_line)362cmd(command_line)363command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/aligned.mlf -t 250.0 150.0 3000.0 -S ./manual/train.scp -H ./manual/hmm7/macros -H ./manual/hmm7/hmmdefs -M ./manual/hmm8 ./manual/monophones1'364cmd_list.append(command_line)365cmd(command_line)366command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/aligned.mlf -t 250.0 150.0 3000.0 -S ./manual/train.scp -H ./manual/hmm8/macros -H ./manual/hmm8/hmmdefs -M ./manual/hmm9 ./manual/monophones1'367cmd_list.append(command_line)368cmd(command_line)369#step 9370fhOut = open('./manual/mktri.led', 'w')371fhOut.write('WB sp\nWB sil\nTC\n')372fhOut.close()373#This creates 2 files: wintri.mlf triphones1374command_line = 'HLEd -A -D -T 1 -n ./manual/triphones1 -l \'*\' -i ./manual/wintri.mlf ./manual/mktri.led ./manual/aligned.mlf'375cmd_list.append(command_line)376cmd(command_line)377# create the mktri.hed file378command_line = 'perl ./lib/HTK_scripts/maketrihed ./manual/monophones1 ./manual/triphones1'379cmd_list.append(command_line)380cmd(command_line)381os.system('mv ./mktri.hed ./manual/')382#383command_line = 'HHEd -A -D -T 1 -H ./manual/hmm9/macros -H ./manual/hmm9/hmmdefs -M ./manual/hmm10 ./manual/mktri.hed ./manual/monophones1'384cmd_list.append(command_line)385cmd(command_line)386command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/wintri.mlf -t 250.0 150.0 3000.0 -S ./manual/train.scp -H ./manual/hmm10/macros -H ./manual/hmm10/hmmdefs -M ./manual/hmm11 ./manual/triphones1 '387cmd_list.append(command_line)388cmd(command_line)389command_line = 'HERest -A -D -T 1 -C ./manual/config -I ./manual/wintri.mlf -t 250.0 150.0 3000.0 -s ./manual/stats -S ./manual/train.scp -H ./manual/hmm11/macros -H ./manual/hmm11/hmmdefs -M ./manual/hmm12 ./manual/triphones1'390cmd_list.append(command_line)391cmd(command_line)392#step 10393command_line = 'HDMan -A -D -T 1 -b sp -n ./manual/fulllist -g ./manual/global.ded -l ./manual/flog ./manual/dict-tri ./lexicon/VoxForgeDict'394cmd_list.append(command_line)395cmd(command_line)396vi('./manual/fulllist1')397os.system('cat ./manual/fulllist ./manual/triphones1 > ./manual/fulllist1')398os.system('perl ./lib/HTK_scripts/fixfulllist_pl ./manual/fulllist1 ./manual/fulllist')399########tree.hed modification400os.system('cp ./lib/support_data/tree.hed ./manual/')401command_line = 'perl ./lib/HTK_scripts/mkclscript.prl TB 350 ./manual/monophones0 >> ./manual/tree.hed'402cmd_list.append(command_line)403os.system(command_line)404###coutious append file405fhOut = open('./manual/tree.hed', 'a')406fhOut.write('\nTR 1\n\n\407AU "fulllist"\n\408CO "tiedlist"\n\n\409ST "trees"\n')410fhOut.close()411########412#ERROR [+2662] FindProtoModel: no proto for sp in hSet413# fix by delete the sp line in ./manual/fulllist file and run through ./manual dir414"""415os.system('./manual/HHEd -A -D -T 1 -H ./hmm12/macros -H ./hmm12/hmmdefs -M ./hmm13 ./tree.hed ./triphones1')416command_line = 'HHEd -A -D -T 1 -H ./manual/hmm12/macros -H ./manual/hmm12/hmmdefs -M ./manual/hmm13 ./manual/tree.hed ./manual/triphones1 '417cmd_list.append(command_line)418cmd(command_line)419"""420sed('sp', './manual/fulllist', './manual/fulllist')421command_line = 'cd ./manual && HHEd -A -D -T 1 -H ./hmm12/macros -H ./hmm12/hmmdefs -M ./hmm13 ./tree.hed ./triphones1'422cmd_list.append(command_line)423os.system(command_line)424#create hmm14425command_line = 'cd ./manual/ && HERest -A -D -T 1 -T 1 -C config -I wintri.mlf -s stats -t 250.0 150.0 3000.0 -S train.scp -H hmm13/macros -H hmm13/hmmdefs -M hmm14 tiedlist'426cmd_list.append(command_line)427os.system(command_line)428os.system('say "hmm14 has finished"')429#create hmm15430command_line = 'cd ./manual/ && HERest -A -D -T 1 -T 1 -C config -I wintri.mlf -s stats -t 250.0 150.0 3000.0 -S train.scp -H hmm14/macros -H hmm14/hmmdefs -M hmm15 tiedlist'431cmd_list.append(command_line)432os.system(command_line)433os.system('say "your hmm15 has finished"')434#####435# The hmmdefs file in the hmm15 folder,436# along with the tiedlist file,437# can now be used with Julian to recognize your speech!438#####439###############################################################################################440#GMM splits441fhOut = open('./manual/split.hed','w')442fhOut.write('MU 2 {*.state[2-4].mix}\n')443fhOut.close()444for i in range(16,21):445 mkdir('./manual/hmm'+str(i))446#os.system('cd ./manual/ && HLEd -A -D -T 1 -n triphones1 -l \'*\' -i wintri.mlf mktri.led aligned.mlf')447command_line = 'cd ./manual/ && HHEd -A -D -T 1 -H hmm15/macros -H hmm15/hmmdefs -M hmm16 split.hed tiedlist'448cmd_list.append(command_line)449os.system(command_line)450command_line = 'cd ./manual/ && HERest -A -D -T 1 -C config -I wintri.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm16/macros -H hmm16/hmmdefs -M hmm17 tiedlist'451cmd_list.append(command_line)452os.system(command_line)453os.system('say "your hmm17 has finished"')454command_line = 'cd ./manual/ && HERest -A -D -T 1 -C config -I wintri.mlf -t 250.0 150.0 3000.0 -s stats -S train.scp -H hmm17/macros -H hmm17/hmmdefs -M hmm18 tiedlist'455cmd_list.append(command_line)456os.system(command_line)457os.system('say "hmm18 has finished"')458#ERROR [+2663] ChkTreeObject: TB only valid for 1 mix diagonal covar models459#solve1: http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/tutorial/triphones/step-10/comments/getting-error-in-tree-clustering460#ERROR [+7036] NewMacro: macro or model name ST_ax_2_1 already exists461# solve should use split.hed instead of tree.hed462#os.system('sed -e \'s/^TB/TC/g\' ./manual/tree.hed > tmp')463#os.system('mv ./tmp ./manual/tree2.hed')464# ERROR [+5010] InitSource: Cannot open source file t+ow465'''466command_line = 'cd ./manual/ && HHEd -A -D -T 1 -H hmm18/macros -H hmm18/hmmdefs -M hmm19 split.hed tiedlist'467cmd_list.append(command_line)468os.system(command_line)469os.system('say "hmm19 has finished"')470'''471command_line = 'cd ./manual/ && HERest -A -D -T 1 -T 1 -C config -I wintri.mlf -s stats -t 250.0 150.0 3000.0 -S train.scp -H hmm18/macros -H hmm18/hmmdefs -M hmm20 tiedlist'472cmd_list.append(command_line)473os.system(command_line)474os.system('say "hmm20 has finished"')475command_line = 'cd ./manual/ && HERest -A -D -T 1 -T 1 -C config -I wintri.mlf -s stats -t 250.0 150.0 3000.0 -S train.scp -H hmm20/macros -H hmm20/hmmdefs -M hmm21 tiedlist'476cmd_list.append(command_line)477os.system(command_line)478os.system('say "hmm21 has finished"')479os.system('say "Splitting Hidden Markov Model task has finished"')480###############################################################################################481#Running Julian Live482#cp julian config483testDataDir = '/Volumes/1/E6998_testing'484motif = 'prompts'485integratedPROMPTSFilePath_testing = './manual/prompts_testing'486wlistFullPath_testing = './manual/wlist_testing'487dictPath = './manual/dict_testing'488dictTriPath = './manual/dict-tri'489grammarFilePath = './manual/fixed.grammar'490vocaFilePath = './manual/fixed.voca'491configFilePath = './manual/julian.jconf'492wavsFilePath = './manual/wavPath_testing'493mfcsFilePath = './manual/mfcPath_testing'494scpFilePath = './manual/testing.scp'495validationPath = './manual/validation_testing'496####497####* optional498#change files in CUE6998_2014_09-20140929 folder to the same name as in prompts499if 0:500 for i in excludeNamesStartWith(os.listdir('/Volumes/1/E6998_testing/CUE6998_2014_09-20140929')):501 if re.search('vf5',i):502 j = i.replace('5','9')503 os.system('mv '+check_dir('/Volumes/1/E6998_testing/CUE6998_2014_09-20140929') + i + ' ' +check_dir('/Volumes/1/E6998_testing/CUE6998_2014_09-20140929')+ j)504targetDirs = getDirNamesInCurrDir(testDataDir)505targetPrompts = [check_dir(x)+searchFileWithSimilarNameMotif_returnBest(x, motif) for x in targetDirs]506targetWavs = [check_dir(x)+y for x in targetDirs for y in excludeNamesStartWith(os.listdir(x)) ]507targetWavs = [x for x in targetWavs if x.endswith('.wav')]508targetMfcs = [x.replace('wav','mfc') for x in targetWavs]509getPrompts(integratedPROMPTSFilePath_testing,targetPrompts,testDataDir)510removeMHatInFile(integratedPROMPTSFilePath_testing) #no ^M symbol allowed511getWlist(wlistFullPath_testing, integratedPROMPTSFilePath_testing)512runHDManGetMonophone(wlistFullPath_testing,dictPath)513#generate wav list file514fhIn = open(wavsFilePath,'w')515fhIn.write('\n'.join(targetWavs))516fhIn.close()517#generate mfc list file518fhIn = open(mfcsFilePath,'w')519fhIn.write('\n'.join(targetMfcs))520fhIn.close()521#generate scp file for HCopy522fhIn = open(scpFilePath,'w')523for i in range(len(targetWavs)):524 fhIn.write(targetWavs[i] + ' ' + targetMfcs[i]+ '\n')525fhIn.close()526command_line = 'HCopy -A -D -T 1 -C ./manual/wav_config -S '+scpFilePath527cmd_list.append(command_line)528cmd(command_line)529########analysis of prompts file530#max sentence length531count = getWordCountEachLine(integratedPROMPTSFilePath_testing)532sentenceLength = [x-1 for x in count]533print "the max sentence length is: "534print max(sentenceLength)535#voca Table (vocab words)536#rerun point 614537voca2D = read_file_as_2D_dict(integratedPROMPTSFilePath_testing)538#dicrt table from HDMan (with phone)539dict2D = read_file_as_2D_dict(dictTriPath,'\s\s+') # dictPath!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!540######writing .grammar file541if not file_exist(grammarFilePath):542 vi(grammarFilePath)543fhIn = open(grammarFilePath,'w')544firstLine = 'S: NS_B '545otherLines = ''546vocaGroup = []547for i in range(1, max(sentenceLength)+1):548 firstLine += numToWords(i).upper() + "_LOOP "549 otherLines += numToWords(i).upper() + "_LOOP: "+numToWords(i).upper()+'_WORD\n'550 vocaGroup.append(numToWords(i).upper()+'_WORD')551firstLine += "NS_E\n"552allContent = firstLine + otherLines553fhIn.write(allContent)554fhIn.close()555######writing voca file556vi(vocaFilePath+'_tmp')557fhIn = open(vocaFilePath+'_tmp','w')558otherLines = ''559for i in range(len(vocaGroup)):560 occurred = []561 flag = "% " + vocaGroup[i]562 otherLines += flag +'\n'563 for line in range(len(voca2D)):564 if i+1 in voca2D[line].keys(): # first column is the address, negelect it565 currWord = voca2D[line][i+1] # first column is the address, negelect it566 if not currWord in occurred:567 occurred.append(voca2D[line][i+1])568 otherLines += currWord + '\n'569 otherLines += '\n'570fhIn.write(otherLines)571fhIn.close()572######573#NORMALIZE vocab to map with dict574command_lines = ['cp ' + vocaFilePath+'_tmp' + ' ' + vocaFilePath+'_tmp' + '_ori',575 'sed \'/[\\"\,\:\;\&\.\\\/\!\s*]/d\' '+ vocaFilePath+'_tmp' + ' > ./tmp1',576 'tr \'[:lower:]\' \'[:upper:]\' < ./tmp1 > ./tmp2', # TO UPPER CASE577 'sed \'/^-/d\' ./tmp2 > tmp1',578 'sed "/^\'/d" tmp1 > tmp2',579 'sed -e \'s/[0-9]*//g\' tmp2 > tmp1', # DELETE NUMBERS580 'sed \'/^$/d\' tmp1 > '+vocaFilePath+'_tmp', # DELETE EMPTY LINE581 #'awk \'!x[$0]++\' tmp2 > ' + vocaFilePath+'_tmp', #delete duplicate line582 'rm tmp1 tmp2']583for command_line in command_lines:584 os.system(command_line)585#mapping dict_testing to fixed.voca586fhIn = open(vocaFilePath+'_tmp', 'r')587allVocab = fhIn.readlines()588fhIn.close()589totalNotFind = []590for i in range(len(allVocab)):591 vocab = allVocab[i].rstrip()592 if not vocab.startswith('%'):593 find = 0594 for j in range(len(dict2D)):595 if dict2D[j][0].upper() == vocab.upper():596 find = 1597 try:598 allVocab[i] = vocab + '\t' + dict2D[j][1] + '\n' # dict2D[j][1] for dictTriPath [2] for dictPath599 break600 except KeyError:601 print 'The following lines are not correctly aligned, please make sure that phones have separate keys'602 print "modify the corresponding line in dict file and add an extra space (make there two) between the second andthird column"603 print dict2D[j]604 print 'rerun from <#rerun point 614>'605 ifContinue()606 if find==0:607 totalNotFind.append(vocab)608os.system('say "mapping phones finished"')609fhIn = open(vocaFilePath,'w')610fhIn.write('% NS_B\n<s>\tsil\n\n% NS_E\n</s>\tsil\n')611fhIn.write(''.join(allVocab))612fhIn.close()613# delete sp in the end of each line614sed_replace('sp$','',vocaFilePath,vocaFilePath)615#sed -e s/'SP'$/''/g fixed.voca616###error running mkdfa.pl617#Warning: dfa_minimize not found in the same place as mkdfa.pl618#solution: make sure mkfa/dfa_minimize is in the same folder with mkdfa.pl (if .dSYM is listed as extension, see next line of comment)619#solution: change mkfa->mkfa.dSYM [in line 15] and dfa_minimize -> dfa_minimize.dSYM [in line 18] in mkdfa.pl file620command_line = 'cd ./manual/ && perl ../lib/HTK_scripts/mkdfa.pl fixed'621os.system(command_line)622if not file_exist(configFilePath):623 os.system('cp ./lib/support_data/julian.jconf ./manual/')624 print 'need to manually change the parameters'625 ifContinue()626#test grammar627command_line = 'cd ./manual/ && generate.dSYM fixed'628os.system(command_line)629##manually change any line that error occurs in fixed.dict630#Error: voca_load_htkdict: line 920: corrupted data:631command_line = 'cd ./manual/ && julius.dSYM -input mic -C ./julian.jconf'632os.system(command_line)633##error:ERROR: Error while setup work area for recognition634#comment the following lines635#-iwsp # append a skippable sp model at all word ends636#-iwsppenalty -70.0 # transition penalty for the appenede sp models637#run with result (list of files input)638command_line = 'julius.dSYM -filelist ./mfcPath_testing -C ./julian.jconf -outfile'639os.system(command_line)640#########################################################641#Evaluation, sentence alignment642#get the sentence from prompts into 2D dict643promptSentence2D = {}644for i in targetPrompts:645 if file_exist(i):646 dirName = os.path.dirname(i).split('/')[-1]647 with open(i,'r') as fhIn:648 content = fhIn.readlines()649 tmp = {}650 for line in content:651 if line:652 ele = line.rstrip().split(' ')653 first = ele.pop(0)654 if first != '':655 tmp[first] = ' '.join(ele)656 promptSentence2D[dirName] = tmp657#how many search are failed658outFilePath = open(mfcsFilePath,'r').readlines()659outFilePath = [x.replace('.mfc','.out').rstrip() for x in outFilePath]660failedNum = 0661totalNum = 0662predictedSentence2D = {}663preDir = ''#os.path.dirname(outFilePath[0]).split('/')[-1]664tmp = {}665for i in outFilePath:666 dirName = os.path.dirname(i).split('/')[-1]667 if preDir == '':668 preDir = dirName669 currTargetTrack = os.path.basename(i).split('.')[0]670 if file_exist(i):671 content = open(i, 'r').read()672 if dirName == preDir:673 if re.search('<search failed>', content):674 targetSentence = '<search failed>'675 else:676 targetSentence = re.search(re.escape('<s> ')+"(.*?)"+re.escape(' </s>'),content).group(1)677 tmp[currTargetTrack] = targetSentence678 if dirName != preDir or i == outFilePath[-1]:679 predictedSentence2D[preDir] = tmp680 preDir = dirName681 tmp = {}682 if re.search('<search failed>', content):683 targetSentence = '<search failed>'684 else:685 targetSentence = re.search(re.escape('<s> ')+"(.*?)"+re.escape(' </s>'),content).group(1)686 tmp[currTargetTrack] = targetSentence687 totalNum += 1688 if re.search('<search failed>', content):689 failedNum += 1690print 'out of ' + str(totalNum) + ' of processed files, '+ str(failedNum) + ' are failed'691#global alignment for two sentences692#in this case, it aligns the prompts sentence and the predict sentence693# Create sequences to be aligned.694fhIn = open(validationPath,'w')695totalInsertion = 0696totalDeletion = 0697totalReplacement = 0698totalMatch = 0699totalLength = 0700for dir in predictedSentence2D.keys():701 for track in predictedSentence2D[dir].keys():702 prom = promptSentence2D[dir][track].lower()703 pred = predictedSentence2D[dir][track].lower()704 totalLength += len(prom.split())705 if not predictedSentence2D[dir][track] == '<search failed>':706 matched = stringMatching(prom.split(), pred.split())707 #calculate statistics708 insert = [x for x in matched[2] if x == 'I']709 delete = [x for x in matched[2] if x == 'D']710 replace = [x for x in matched[2] if x == 'R']711 match = [x for x in matched[2] if x == 'M']712 totalInsertion += len(insert)713 totalDeletion += len(delete)714 totalReplacement += len(replace)715 totalMatch += len(match)716 #calculate statistics717 line1 = 'PROMPT: ' + dir + '\t' + track + '\t' + prom + '\t' + ' '.join(matched[0]) + '\t' + ' '.join(matched[2])718 line2 = 'RESULT: ' + dir + '\t' + track + '\t' + pred + '\t' + ' '.join(matched[1])719 fhIn.write(line1 + '\n')720 fhIn.write(line2 + '\n')721fhIn.close()722totalError = totalReplacement + totalInsertion + totalDeletion723print 'Total Match: '+str(totalMatch)+'\t'+str(float(totalMatch)/totalLength*100)+'%'724print 'Total Insertion: '+str(totalInsertion)+'\t'+str(float(totalInsertion)/totalLength*100)+'%'725print 'Total Deletion: '+str(totalDeletion)+'\t'+str(float(totalDeletion)/totalLength*100)+'%'726print 'Total Replacement: '+str(totalReplacement)+'\t'+str(float(totalReplacement)/totalLength*100)+'%'...
main.py
Source:main.py
...13 system('exit')14else:15 others, success, error, info, reset = helpers.MessagesColors.values()16 historic = []17 def command_line():18 arr_command = []19 command = input('\n $~ ')20 arr_command = command.split()21 if arr_len(arr_command, 0):22 23 if arr_command[0] == 'ac':24 if arr_len(arr_command, 1):25 26 if arr_command[1] == 'emerg':27 if arr_len(arr_command, 2):28 if arr_command[2] == 'c' or arr_command[2] == 'r':29 opt = arr_command[2]30 if arr_len(arr_command, 3):31 if arr_command[3] == '?':32 print(f'\n >>{info} Possible arguments: Customer ID{reset}')33 command_line()34 elif arr_command[3] != '':35 id = arr_command[3]36 try:37 msg = emergency.generate_emergency_services(opt, id)38 print(f'\n >> {success}{msg}{reset}')39 historic.append(''.join(msg))40 except Exception as err:41 print(f'\n >>{error} An Error has occured!{reset}\t{info}\n {err}{reset}')42 command_line()43 command_line()44 else: message(2)45 46 else: message(2)47 elif arr_command[2] == '?':48 print(f'\n >>{info} Possible arguments: c (Technical Visit) | r (Equipment Removal){reset}')49 command_line()50 else: message(1)51 else: message(2)52 elif arr_command[1] == 'servs':53 try:54 msg = services.generate_services()55 print(f'\n >> {success}{msg}{reset}')56 historic.append(''.join(msg))57 except Exception as err:58 print(f'\n >>{error} An Error has occured!{reset}\t{info}\n {err}{reset}')59 command_line()60 command_line()61 elif arr_command[1] == 'logins':62 try:63 msg1 = logins.generate_logins()64 print(f'\n >> {success}{msg1}{reset}')65 msg2 = RL.register_logins()66 print(f'\n >> {success}{msg2}{reset}')67 historic.append(''.join(msg1))68 historic.append(''.join(msg2))69 except Exception as err:70 print(f'\n >>{error} An Error has occured!{reset}\t{info}\n {err}{reset}')71 command_line()72 command_line()73 elif arr_command[1] == 'times':74 if arr_len(arr_command, 2):75 region = arr_command[2]76 if region != '?':77 try:78 table = time_services.generate_time_service_list(region)79 print(f'\n{success}{table}{reset}')80 except Exception as err:81 print(f'\n >>{error} An Error has occured!{reset}\t{info}\n {err}{reset}')82 command_line()83 command_line()84 85 elif arr_command[2] == '?':86 print(f'\n >>{info} Possible arguments: Region{reset}')87 command_line()88 else: message(1)89 elif arr_command[1] == 'provis':90 if len(arr_command) < 3:91 try:92 msg = provisioning.generate_customers_info()93 print(f'\n >> {success}{msg}{reset}')94 historic.append(''.join(msg))95 except Exception as err:96 print(f'\n >>{error} An Error has occured!{reset}\t{info}\n {err}{reset}')97 command_line()98 else:99 try:100 infos = '{} {} {} {} {} {}'.format(arr_command[2], arr_command[3], arr_command[4], arr_command[5], arr_command[6], arr_command[7])101 msg = provisioning.generate_provisioning(infos)102 print(f'\n >> {success}{msg}{reset}')103 historic.append(''.join(msg))104 except Exception as err:105 print(f'\n >>{error} An Error has occured!{reset}\t{info}\n {err}{reset}')106 command_line()107 command_line()108 elif arr_command[1] == 'ic':109 if arr_len(arr_command, 2):110 id = arr_command[2]111 if id != '?':112 try:113 msg = customer.show_customer_infos(id)114 print(f'\n {success}{msg}{reset}')115 historic.append(''.join(msg))116 except Exception as err:117 print(f'\n >>{error} An Error has occured!{reset}\t{info}\n {err}{reset}')118 command_line()119 command_line()120 elif arr_command[2] == '?':121 print(f'\n >>{info} Possible arguments: Customer ID{reset}')122 command_line()123 else: message(1)124 else: message(2)125 elif arr_command[1] == 'sheets':126 if arr_len(arr_command, 2):127 date = arr_command[2]128 if date != '?':129 try:130 msg = DS.download_sheets(date)131 print(f'\n >> {success}{msg}{reset}')132 historic.append(''.join(msg))133 except Exception as err:134 print(f'\n >>{error} An Error has occured!{reset}\t{info}\n {err}{reset}')135 command_line()136 command_line()137 elif arr_command[2] == '?':138 print(f'\n >>{info} Possible arguments: Services Date{reset}')139 command_line()140 else: message(1)141 else: message(2)142 elif arr_command[1] == 'sched':143 try:144 msg = OS.os_scheduling()145 print(f'\n >> {success}{msg}{reset}')146 historic.append(''.join(msg))147 except Exception as err:148 print(f'\n >>{error} An Error has occured!{reset}\t{info}\n {err}{reset}')149 command_line()150 command_line()151 elif arr_command[1] == 'atend':152 if arr_len(arr_command, 2):153 customer_qtd = arr_command[2]154 if customer_qtd != '?':155 try:156 CA.transfer_attendaces(customer_qtd)157 except Exception as err:158 print(f'\n >>{error} An Error has occured!{reset}\t{info}\n {err}{reset}')159 command_line()160 command_line()161 elif arr_command[2] == '?':162 print(f'\n >>{info} Possible arguments: Number of Customers{reset}')163 command_line()164 else: message(1)165 else: message(2)166 elif arr_command[1] == '?':167 print(f'\n >>{info} Possible arguments: emerg | servs | logins | times | provis | ic | sheets | sched | atend {reset}')168 command_line()169 else: message(1)170 else: message(2)171 elif command == 'exit': return172 elif command == 'hist':173 if len(historic) == 0:174 print(f'\n >> {info}Não há histórico a ser exibido.{reset}')175 else:176 for msg in historic:177 print(f'\n {info}{msg}{reset}')178 command_line()179 elif command == 'clear':180 system('cls')181 command_line()182 else: message(1)183 else: command_line()184 def message(msg):185 if msg == 1: print(f'\n >> {error}Error: Command not recognized.{reset}')186 if msg == 2: print(f'\n >> {error}Error: Arguments are missing.{reset}')187 command_line()188 def arr_len(arr, num):189 if len(arr) > num: return True...
Learn to execute automation testing from scratch with LambdaTest Learning Hub. Right from setting up the prerequisites to run your first automation test, to following best practices and diving deeper into advanced test scenarios. LambdaTest Learning Hubs compile a list of step-by-step guides to help you be proficient with different test automation frameworks i.e. Selenium, Cypress, TestNG etc.
You could also refer to video tutorials over LambdaTest YouTube channel to get step by step demonstration from industry experts.
Get 100 minutes of automation test minutes FREE!!